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Preface to Fourth Edition 


In 1977 I asked Dr. Walpole to collaborate with me on the third edition of this 
book, of which previous editions had been published in 1962 and 1971, with the 
hope that he would assume the responsibility for future editions if and when the 
need would arise. It was a pleasure working with him on the third edition, 
published in 1980, but otherwise things did not work out as planned. He had 
finished revising the first six chapters for a new fourth edition when he passed 
away in the spring of 1985. He had also conducted an informal survey of professors 
who had taught with our text, and as a result of this survey he intended to 
introduce Jacobians in connection with functions of several random variables, 
improve the formulation of the hypotheses used in the analysis of contingency 
tables, expand the material on multiple regression using matrix notation, and 
include tables for some of the most popular nonparametric tests. In addition to 
numerous minor changes, corrections, and additions, I followed these suggestions 
and added some optional material on Jacobians to Chapter 7. Also, the hypotheses 
we test in the analysis of contingency tables have been stated more rigorously; 
there is an optional section treating multiple regression in matrix notation; and 
the nonparametric material has been expanded to include small-sample tests 
based on special tables. This more or less reflects what Dr. Walpole had hoped 
to get done. 

In addition to the acknowledgments in the Preface to the third edition, 
which apply also to the fourth edition, I am indebted to Prentice-Hall, Inc., for 
permission to reproduce part of Table 2 from R. A. Johnson and D. W. Wichern's 
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Preface 


Applied Multivariate Statistical Analysis; to the American Cynamid Company to 
reproduce the material in Table IX; to the Addison-Wesley Publishing Company 
to base Table X on Table 11.4 of D. B. Owen's Handbook of Statistical Tables; 
to the editor of the Annals of Mathematical Statistics to reproduce the material 
in Table XI; and to MINITAB for permission to reproduce the computer printouts 
shown in Figures 14.4 and 14.6. 

I would also like to express my appreciation to Mrs. Rita Ewer for carefully 
reading the manuscript and helping with the proofreading, to Mrs. R. E. Walpole 
and Professor Susan Smith for helping with the proofreading, and to Zita de 
Schauensee, Prentice-Hall production editor, for her courteous cooperation in 
the production of this book. 


Scottsdale, Arizona John E. Freund 


— 


Preface to Third Edition 


Like its first and second editions, this book is designed for a two-semester or 
three-quarter calculus-básed introduction to mathematical statistics. Most of the 
differences between this edition and the preceding ones reflect the changes that 
have taken place in recent years in statistical thinking, and in the teaching of 
statistics. Also, there have been extensive changes in format, which should make 
the book easier to read and easier to teach. 

In addition to substantial changes in notation, the basic material on distribu- 
tion theory has been reorganized; there is a new chapter combining the material 
on functions of random variables; the theoretical and applied aspects of estimation 
have been expanded and placed in two chapters; an expanded coverage is given 
to nonparametric statistics; the introduction to analysis of variance has been 
rewritten with more emphasis on the concepts of experimental design; the material 
on Boolean Algebra has been placed into an appendix; and there are many new 
exercises and illustrations. 

The authors would like to express their appreciation for the many construc- 
tive comments which they have received from their colleagues; also, they would 
like to express their appreciation to the McGraw-Hill Book Company for their 
permission to reproduce in Table II material from their Handbook of Probability 
and Statistics with Tables, and to Professor E. S. Pearson and the Biometrika 
trustees for their permission to reproduce the material in Tables V and VI. 


John E. Freund 
Ronald E. Walpole 
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Introduction 


INTRODUCTION 


In recent years, the growth of statistics has made itself felt in almost every phase 
of human activity. Statistics no longer consists merely of the collection of data 
and their presentation in charts and tables—it is now considered to encompass 
the science of basing inferences on observed data and the entire problem of 
making decisions in the face of uncertainty. This covers considerable ground 
since uncertainties are met when we flip a coin, when a dietician experiments 
with food additives, when an actuary determines life insurance premiums, when 
a quality control engineer accepts or rejects manufactured products, when a 
teacher compares the abilities of students, when an economist forecasts trends, 
when a newspaper predicts an election, and so forth. 

It would be presumptuous to say that statistics, in its present state of 
development, can handle all situations involving uncertainties, but new techniques 
are constantly being developed' and modern statistics can, at least, provide the 
framework for looking at these situations in a logical and systematic fashion. In 
other words, statistics provides the models that are needed to study situations 
involving uncertainties, in the same way as calculus provides the models that are 
needed to describe, say, the concepts of Newtonian physics. 

The beginnings of the mathematics of statistics may be found in mid- 
eighteenth-century studies in probability motivated by interest in games of chance. 
The theory thus developed for “heads or tails" or *red or black" soon found 
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applications in situations where the outcomes were “boy or girl," “life or death,” 
or "pass or fail," and scholars began to apply probability theory to actuarial 
problems and some aspects of the social sciences. Later, probability and statistics 
were introduced into physics by L. Boltzmann, J. Gibbs, and J. Maxwell, and in 
this century they have found applications in all phases of human endeavor which 
in some way involve an element of uncertainty or risk. The names which are 
connected most prominently with the growth of mathematical statistics in the 
first half of this century are those of R. A. Fisher, J. Neyman, E. S. Pearson, and 
A. Wald. More recently, the work of R. Schlaifer, L. J. Savage, and others, has 
given impetus to statistical theories based essentially on methods which date back 
to the eighteenth-century English clergyman Thomas Bayes. 

The approach to statistics presented in this book is essentially the classical 
approach, with methods of inference based largely on the work of J. Neyman 
and E. S. Pearson. However, the more general decision-theory approach is 
introduced in Chapter 9 and some Bayesian methods are presented in Chapter 10. 


12 COMBINATORIAL METHODS 


In many problems of statistics we must list all the alternatives that are possible 
in a given situation, or at least determine how many different possibilities there 
are. In connection with the latter, we often use the following theorem, sometimes 
called the "multiplication rule" for possibilities or choices: 


THEOREM 1.1 If an operation consists of two steps, of which the first can 
be made in п, ways and for each of these the second can be made in п, 
ways, then the whole operation can be made in n: n; Ways. 


Here, “operation” stands for any kind of procedure, process, or task. 
To justify this theorem, let us define the ordered pair (x, yj) to be the 
outcome which arises when the first step results in possibility x; and the second 


step results in possibility уу. Then, the set of all possible outcomes is composed 
of the following n; - п, pairs: 


G3, у), (1, Ya), - .. qns Уһ) 
(2, у), (5, з) к (х, Уһ) 


(хь, у), (х, Уз),...„ (Xm уы) 


Sec. 1.2.: Combinatorial Methods ij 


EXAMPLE 1.1 


Suppose that someone wants to go by bus, by train, or by plane on a week's 
vacation to one of the five East North Central States. Find the number of different 
ways in which this can be done. 


Solution 


The particular state can be chosen in n, = 5 ways and the means of 
transportation can be chosen in п, = 3 ways. Therefore, the trip can be 
carried out in 5-3 — 15 possible ways. If an actual listing of all the 
possibilities is desirable, a tree diagram like that in Figure 1.1 provides a 
systematic approach. This diagram shows that there are n, — 5 branches 
(possibilities) for the number of states and for each of these branches there 
are n; = 3 branches (possibilities) for the different means of transportation. 
It is apparent that the 15 possible ways of taking the vacation are represented 
by the 15 distinct paths along the branches of the tree. A 


EXAMPLE 1.2 


How many possible outcomes are there when a red die'and a green die are thrown? 


Solution 


The red die can land in any one of six ways, and for each of these six ways 
the green die can also land in six ways. Therefore, the pair of dice can land 
in 6:6 = 36 ways. A 


Theorem 1.1 may be extended to cover situations where an operation consists 
of any fixed number of steps. The general case is stated in the following theorem: 


THEOREM 12. If an operation consists of k steps, of which the first can be 
made in n, ways, for each of these the second step can be made in n; ways, 
for each of the first two the third step can be made in n; ways, and so forth, 
then the whole operation can be made in пу, n;* ...* пк Ways. 


EXAMPLE 1.3 


How many different lunches are possible consisting of a soup, a sandwich, a 
„dessert, and a drink if one can select from 4 different soups, 3 kinds of sandwiches, 
5 desserts, and 4 drinks? 
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Illinois 


Figure 1.1 Tree diagram. 


Solution 


The total number of lunches would be 4 · 3 · 5 · 4 = 240. A 


EXAMPLE 1.4 


In how many ways can one mark a true-false test consisting of 20 questions? 
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Solution 


If a true-false test consists of 20 questions, there are 


2-2-2-:2-...-2-2 = 1,048,576 
Nati; tage дына, ЖЕШ I 
20 factors 


different ways in which one can mark the test, and only one of these 
corresponds to the case where each answer is correct. A 


Frequently, we are interested in situations where the outcomes are the 
different orders or arrangements that are possible for a group of objects. For 
example, we might want to know how many different arrangements are possible 
for electing the president, vice-president, treasurer, and secretary from the 24 
members of a club, or we might want to know how many different arrangements 
are possible for seating 6 persons around a table. Different arrangements like 
these are called permutations. 


EXAMPLE 1.5 


How many permutations are there of all three of the letters a, b, and c? 


Solution 


The possible arrangements are abc, acb, bac, bca, cab, and cba, so the number 
of distinct permutations is six. Using Theorem 1.2, we could have arrived 
at this answer without actually listing the different permutations. Since there 
are three choices to select a letter for the first position, then two for the 
second position, leaving only one letter for the third position, the total 
number of permutations is 3: 2: 1 = 6. A 


Generalizing the argument used in this example, we find that n distinct 
objects can be arranged in n(n — 1)(n – 2) > ...:3* 2-1 ways. We represent 
this product by the symbol n!, which is read “п factorial." Thus, 1! = 1, 2! = 
2.122,3! 2 3:2: 1 = 6, and so on. By definition, 0! = 1. 


THEOREM 13 The number of permutations of n distinct objects is n!. ] 


EXAMPLE 1.6 


How many different orders are possible for introducing the 5 starting players of 
a basketball team to the public? 
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Solution 


There are 51 = 5: 4- 3-2- 1 = 120 different orders for introducing the 
starting lineup. A 


EXAMPLE 1.7 


The number of permutations of the four letters a, b, c, and d is 24, but what is 
the number of permutations if we take only two of the four letters, or as it is 
usually put, if we take the four letters two at a time? 


Solution 


Again using Theorem 1.2, we find that we have two positions to fill with 
four choices for the first and then three choices for the second for a total 
of 4: 3 = 12 permutations. A 


Generalizing the argument used in this example, we find that n distinct ob- 
jects taken r at a time, for r > 0, can be arranged in n(n — 1) :...-(n—-r +1) 
ways. We represent this product by the symbol ,,P,. Letting „Po = 1 by definition, 
we can write the following theorem: 


THEOREM 14 The number of permutations of n distinct objects taken r at 
a time is 


P. n! 
MT (n= г)! 
(өк =O fs, 2) exa m 
Proof. For r = 0, we have 
! 
[P= 1 
n! 
When r = 1, 2,...,n, we can write 
„Р, -n(n-1(n-2):...:(n-r-*1) 
_ n(n — D(n -2)-...: (n - + (n - n! 
(n — г)! 


MSAN! 
(п = р)! 
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In applications involving permutations, it is generally easier to proceed by 
using Theorem 1.2 as in Example 1.7, but the factorial formula of Theorem 1.4 
{5 easier to remember and more easily programmed for solution on a computer. 


EXAMPLE 1.8 


Four names are drawn from the 24 members of a club for the offices of president, 
vice-president, treasurer, and secretary. In how many different ways can this be 
done? 


Solution 


The number of permutations of 24 distinct objects taken 4 at a time is 


24! 
= —24.23+22. 21 = 255,024 ^ 


24P4 = 20! 


EXAMPLE 1.9 


In how many ways can a local chapter of the American Chemical Society schedule 
three speakers for three different meetings, if they are all available on any of five 
possible dates? 


Solution 


The number of permutations of 5 distinct objects taken 3 at a time is 


Permutations that occur when objects are arranged in a circle are called 
circular permutations. Two circular permutations are not considered different if 
corresponding objects in the two arrangements are preceded and followed by the 
same objects as we proceed ina clockwise direction. For example, if four persons 
are playing bridge, we do not get a new permutation if they all move one position 
in a clockwise direction. 


EXAMPLE 1.10 


How many circular permutations are there of four persons playing bridge? 
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Solution 


By considering one person in a fixed position and arranging the other three 
in 3! ways, we find that there are six different arrangements (circular 
permutations) of four persons playing bridge. A 


Generalizing the argument used in this example, we get the result stated in 
the following theorem: 


THEOREM 1.5 The number of permutations of п distinct objects arranged 
in a circle is (n — 1)!. 


Throughout our discussion it has been assumed that the n objects from 
which we select r objects and form permutations are all distinct. Thus, our results 
cannot be used, for example, to determine the number of ways in which we can 
arrange the letters in the word "book," or the letters in the word "receive." 


EXAMPLE 1.11 


How many permutations are there of the letters in the word “book”? 


Solution 


If we distinguish for the moment between the two o's by labeling them 0, 
and 0), there are 4! = 24 different permutations of the symbols b, оу, 02, 
and К. However, if we drop the subscripts, then bo,ko, and bo;ko;, for 
instance, both yield boko, and since each pair of permutations with subscripts 
yields but one arrangement without subscripts, the total number of arrange- 


ments of the letters in the word "book" is 4 = 12. ^ 
EXAMPLE 1.12 


How many permutations are there of the letters in the word "receive"? 


Solution 


With subscripts on the e's there are 7! permutations of the letters in the 
word "receive," but since there are 3! = 6 permutations of ез, е, and ез 
which lead to the same arrangement of the letters in "receive," there are 
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7! 
only 31 =7:6:5:4 = 840 different arrangements of the letters in the 
word “receive.” A 


Generalizing the argument used in these two examples, we get the result 
stated in the following theorem: 


THEOREM 1.6 The number of permutations of n objects of which n, are of 
one kind, n; are of a second kind,..., n, are of a kth kind, and n, + n; + 
eec any = п, is 


EXAMPLE 1.13 


In how many ways can 2 oaks, 3 pines, and 2 maples be arranged in a straight 
line if one does not distinguish between trees of the same kind? 


Solution 
The total number of distinct arrangements is 


т 
ҮКҮСҮ ЫЙ 


Often we are interested in determining the number of ways of selecting r 
objects from among n distinct objects without regard to the order in which they 
are selected. Such selections are called combinations. 


EXAMPLE 1.14 


In how many ways can a person gathering data for a market research organization 
interview 3 of the 20 families living in a certain apartment house? 


Solution 
If we cared about the order in which the families are interviewed, the answer 
would be 
zP, = 20: 19 · 18 = 6,840 


10 Chap. 1: Introduction 


but each set of 3 families would then be counted 3! — 6 times. If we are 
not interested in the order in which the 3 families are interviewed, there 


6,840 Д T iR 
are thus only NT — 1,140 ways in which 3 of the 20 families can be 


selected. A 


Actually, “combination” means the same as “subset,” and when we ask for 
the number of combinations of r objects selected from a set of n distinct objects, 
we are simply asking for the total number of subsets of r objects that can be 
selected from a set of n distinct objects. In general, there are г! permutations of 
the objects in a subset of г objects, so that the „Р, permutations of r objects 
selected from a set of n distinct objects contain each subset r! times. Dividing 
„Р, by г! and denoting the result by the symbol ("), we thus have: 


THEOREM 17 The number of combinations of r objects selected from a set 
of n distinct objects is 


forr = 0, 1,25... n. 


EXAMPLE 1.15 


In how many different ways can 6 tosses of a coin yield 2 heads and 4 tails? 


Solution 


This question is equivalent to asking in how many different ways one can 
select the 2 tosses on which heads is to occur; applying Theorem 1.7, we 
thus find the answer to be 


This result could also have been obtained by the rather tedious process of 
enumerating the various possibilities, HHTTTT, TTHTHT, НТНТТТ,..., 
where H stands for head and T for tail. A 


Sec. 12.: Combinatorial Methods n 


EXAMPLE 1.16 


How many committees of two chemists and one physicist can be formed from 
four chemists and three physicists? 


Solution 
А А : ANC 4! 
Since two of four chemists can be seiected in 2 - 21-21 = 6 ways and 
3! 
one of three physicists can be selected in ( ) - 1.2 = 3 ways, Theorem 


1.1 shows that the number of committees is 6 * 3 = 18. A 


A combination of r objects selected from a set of n distinct objects may be 
considered a partition of the n objects into two subsets containing, respectively, 
the r objects that are selected and the n — r objects that are left. Often, we are 
concerned with the more general problem of partitioning a set of n distinct objects 
into k subsets, which requires that each of the n objects must belong to one and 
only one of the subsets.’ The order of the objects within a subset is of no 
importance. 


EXAMPLE 1.17 


In how many ways can a set of four objects be partitioned into three subsets 
containing, respectively, 2, 1, and 1 of the objects? 


Solution 


Denoting the four objects by a, b, c, and d, we find by enumeration that 
there are the twelve possibilities: 


ab\c\d ^ abld|c ac|b|d ac|d|b 
ad|b|c ^ ad|c|b belald — be|d|a 
bdjalc — bd|cla cd|a|b cd|b|a 


The number of partitions for this example is denoted by the symbol 


( ): 12 
2:351 


t Symbolically (see Appendix I at the end of the book), the subsets Ay, А;,..., 
and A, constitute à partition of set A if A, Ау: ы A, = A and A, ^ A =Ø 
for all i # j. ^ 
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where the number at the top represents the total number of objects and the 
numbers at the bottom represent the number of objects going into each 


subset. A 


In general, we have the following theorem: 


THEOREM 1.8 The number of ways of partitioning a set of n distinct objects 
into k subsets with n, objects in the first subset, n; objects in the second 


subset, ..., and л, objects in the kth subset, is 
( n ) мч n! 
пу, nj,..., ny п! m!’ п! 


п 
я ) ways to form the first subset. 
1 


Proof. First we note that there are ( 


n-n 
For each of these there are ( A ') ways to form the second subset, for 
2 


n-n-n 


each first and second subset there are ( ) ways to form the third 


n 
subset, and so forth. Hence, by Theorem 1.2 it follows that 


oer AA 


n! х (п — n)! : | 
m!-(n-m) m! (n- n т)! °) 


es, 
= 
— 
Ш 


Mi, M2,..., Mk 


Uem vene 
пк! : 0! 
n! 
= Ж: 
mi:nml:...: n! 


EXAMPLE 1.18 


In how many ways can seven scientists be assigned to one triple and two double 
hotel rooms? 
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Solution 
Substitution of n = 7, n, = 3, m = 2, and n, = 2 into the formula of 
; 7 7! 
Theorem 1.8 yields (23 - 31-21-21 = 210. A 


13 BINOMIAL COEFFICIENTS 


If n is a positive integer and we multiply out (x + y)" term by term, each term 
will be the product of x's and y's, with an x or a y coming from each of the n 
factors x + y. For instance, the expansion 
(x + yy = (x + у)(х + yx + у) 
=х бї + ax wx уН) ЖЛ ЙЛ ООРУУ 
Thy aes хуз хуу ук eater унуну 

= x? + 3х?у + 3ху* + у? 

yields terms of the form x°, х?у, xy?, and у°. Their coefficients are 1, 3, 3, and 


1, and the coefficient of xy”, for example, is = 3, the number of ways in 


2 
which we can choose the two factors providing the y’s. Similarly, the coefficient 


3 
of x?y is () = 3, the number of ways in which we can choose the one factor 


providing the y, and the coefficients of x^ and y? are () = ] and (3) Evi 
More generally, if n is a positive integer and we multiply out (x + y)" term 


: e Д п г " 
by term, the coefficient of x" "у" is ‚ the number of ways in which we can 
r 


choose the r factors providing the y's. Accordingly, we refer to () as a binomial 


coefficient. We can now state the following theorem: 


THEOREM 1.9 


ntf : 
(x*y"-X (Dow for any positive integer n 
r=0 
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(In case the reader is not familiar with the Y, notation, he will find a brief 
explanation in Appendix II at the end of the book.) 


The calculation of binomial coefficients can often be simplified by making 
use of the three theorems which follow. 


THEOREM 1.10 For any positive integers п and г = 0, 1, 2,...,", 


Proof. We might argue that when we select a subset of r objects 
from a set of n distinct objects we leave a subset of n — r objects, and, 
hence, there are as many ways of selecting r objects as there are ways of 
leaving (or selecting) n — r objects. To prove the theorem algebraically, we 
write 


n! n! 


n 1 
(, = ) Т Аа си »-2nr 


a n! Акен 
(п =) (7) М 


Theorem 1.10 implies that if we calculate the binomial coefficients for r — 0, 
nec 


n 3 ] IB. 
dy 2 when n is even and for r = 0,1,..., where n is odd, the remaining 


binomial coefficients can be obtained by making use of the theorem. 


EXAMPLE 1.19 
7 4 ^ 4 4 
Given (%) = land () — 4, find () and à. 
Solution 
4 4 4 4 4 4 
2 E = = = A 
() (, N () ee ( М 2 ) (0) ; 
EXAMPLE 1.20 


cnn (8) (e oem Qi 
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Solution 


It is precisely in this fashion that Theorem 1.10 may have to be used in connection 
with Table VII at the end of the book. 


EXAMPLE 1.21 
у 20 17 
Еіпа i and mr 


Solution 


20 
To find (5). we make use of the fact that (3) - (ny look up ( 2 


and get з) = 125,970; to find (ca we make use of the fact that 


T 
(7) (Dose (5) enema 


n -1, 


THEOREM 111 For any positive integer n and r = 1,2; cae 


(ha) em 


Proof. Substituting x — 1 into (x + у)", let us write 
(1+ у)" = (1+ у) + у)" = (1+ yr + ya + у)" 


in (1 + y)" with that in (1+ ЖУУШ. 


and equate the coefficient of y 
") and the coefficient 


y(1 + y)". Since the coefficient of y' in (1 + y)" is 


16 
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of y” in (1 + y)! + y(1 + у)! is the sum of the coefficient of y” in 


247] 76 
(1 + y)"^', namely, E , and the coefficient of у”! in (1 + y)", 
r 


=д 
namely, (" d ') , we obtain 


which completes the proof. v 


Theorem 1.11 can also be proved by expressing the binomial coefficients 
on both sides of the equation in terms of factorials and then proceeding algebrai- 
cally, but we shall leave this to the reader in Exercise 6 on page 20. One important 
application of Theorem 1.11 is given in Exercise 5 on page 19, where it provides 
the key for the construction of what is known as Pascal's triangle. 

To state the third theorem about binomial coefficients, let us make the 


definition that ") = 0 whenever n is a positive integer and r is a positive integer 


greater than n. (Clearly, there is no way in which we can select a subset from a 
set which contains more elements than the set itself.) 


THEOREM 1.12 


Proof. Using the same technique as in the proof of Theorem 1.11, 


let us prove this theorem by equating the coefficients of y* in the expressions 
on both sides of the equation 


(туут (ВЕЕ y) py)" 


A К nis Sud Е 5 
The coefficient of y* in (1 + y)"*" is (n k з): and the coefficient of у“ in 


(1+ ym + у)" = И 5 (7, et (")»-] 
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is the sum of the products which we obtain by multiplying the constant 
term of the first factor by the coefficient of y* in the second factor, the 
coefficient of y in the first factor by the coefficient of y* in the second 


factor,..., and the coefficient of y* in the first factor by the constant term 
of the second factor. Thus, the coefficient of y* in (1 4 y)™(1 + y)" is 


(309 «(0*9 + G3 И 


and this completes the proof. v 


EXAMPLE 1.22 
Verify Theorem 1.12 numerically for m = 2, n = 3, and k = 4. 


Solution 


0-00 «00 OO) «QU -() 


2 x 
and since С) А ( A , and (Q equal 0 according to the definition on page 


16, the equation reduces to 


which checks, since2: 1 + 1:3 = 5. A 


Using Theorem 1.8, we can extend our discussion to multinomial coefficients, 
namely, to the coefficients that arise in the expansion of (x, + X; ++ хк)". 
The multinomial coefficient of the term x? * хӯ ·... * xe in the expansion of 
(x, xoc x)" is 
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EXAMPLE 1.23 


What is the coefficient of x}x-x3 in the expansion of (x, + x; + хз)? 


Solution 


Substituting into the formula above, we get 


6! 


fue m, А 


THEORETICAL EXERCISES 


1. In a two-team play-off in some sport, the winner is the first team to win m 
games. 
(a) Counting separately the number of play-offs requiring m, m + 1,..., 
and 2m — 1 games, show that the total number of different outcomes 
(sequences of wins and losses) is 


de 2] 


(b) How many different outcomes are there in a "2 out of 3" play-off, a “3 
out of 5" play-off, and a “4 out of 7” play-off? 


2. An operation consists of two steps, of which the first can be made in n, ways. 
If the first step is made in the ith way, the second step can be made in n;, ways.' 
(a) What is the total number of ways in which the whole operation can be 
made? 

(b) A student can study 0, 1, 2, or 3 hours for a statistics test on any given 
day. In how many ways can this student study at most 4 hours for the 
test on two consecutive days? 


3. When n is large, n! can be approximated by means of the expression 


ra) 


called Stirling’s formula, where e is the base of natural logarithms. (A 
derivation of this formula may be found in the book by W. Feller cited among 
the references at the end of this chapter.) 


' The use of double subscripts is explained in Appendix II. 
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(a) Use Stirling's formula to obtain approximations for 10! and 12!, and 
find the percentage errors of these approximations by comparing them 
with the exact values given in Table VII. 

(b) Use Stirling's formula to obtain an approximation for the number of 
13-card bridge hands that can be dealt with an ordinary deck of 52 
playing cards. 

(c) Use Stirling's formula to show that 


2) yaa 

lim = 1 

вео E ee 

. In occupancy theory we are concerned with the number of ways in which 
certain distinguishable or indistinguishable objects can be distributed among 

a given number of individuals, urns, boxes, or cells. 

(a) Find an expression for the number of ways in which r distinguishable 
objects can be distributed among n cells, and use it to find the number 
of ways in which three new books can be issued to 12 members of a 
library. 

(b) Find an expression for the number of ways in which r indistinguishable 
objects can be distributed among n cells, and use it to find the number 
of ways in which a baker can sell ten (indistinguishable) loaves of bread 
to six customers. ( Hint: For г = 5 and n 3, for example, we might 
argue that 0000|0 represents the case where one object goes into the 
first cell, three into the second cell, and one into the third cell, and we 
must look for the number of ways in which we can arrange the five 0's 
and the two vertical bars.) 

(c) Find an expression for the number of ways in which r indistinguishable 
objects can be distributed among n cells, with at least one object in 
each cell, and rework the numerical part of (b) if each customer must 
get at least one loaf of bread. 


„ When no table is available, it is sometimes convenient to determine binomial 
coefficients by means ofthe following arrangement, called Pascal's triangle, 


where each row begins with a 1, ends with a 1, and each other entry is the 
sum of the nearest two entries in the row immediately above. 
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е 


10. 


1. 


(a) Explain why, with this method of construction, the rth entry of the nth 
row is the binomial coefficient (" 5 э); (Hint: Use mathematical 


induction and Theorem 1.11.) 
(b) Construct the next two (seventh and eighth) rows of the triangle and 
write the binomial expansions of (x + у)° and (x + yy. 


Prove Theorem 1.11 by expressing all the binomial coefficients in terms of 
factorials and then simplifying algebraically. 


. Expressing the binomial coefficients in terms of factorials and simplifying 


algebraically, show that 


в) (Dye stra.) 
ЕТ: ds 
(c) (^ )-eev( 2). 


Substitute suitable values for x and y into the formula of Theorem 1.9 to 
show that 


(a) z() =? 


( X -1y(") = 0: 


r=0 


б? (Pa P тушди 


r=0 


Repeatedly apply Theorem 1.11 to show that 


Use Theorem 1.12 to show that 


Show that Y 5") = HM 
r=0 r 
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(a) by setting x — 1 in Theorem 1.9, then differentiating the expressions 
on both sides with respect to y, and finally substituting y — 1; 
(b) by making use of part (a) of Exercise 8 and part (c) of Exercise 7. 


12. If n is not a positive integer or zero, the binomial expansion of (1+ y)" 
yields, for —1 < y « 1, the infinite series 


Qe pr 


—1) °°... amr 
where (”) = maa Dart) for r= 1, 2, 3,..*. Use this 


generalized definition of binomial coefficients, which, incidentally, agrees with 
that on page 13 for positive integral values of n, to evaluate 


(a) () and (2): 


(b) v5 by writing V5 = 2(1 + 1)? and using the first four terms of the 
binomial expansion of (1 + 1)*. 
Also, show that 


(с) (7) = (71); 
r 


(d) (on) " 6 3% А E ) for n > 0. 


13. Find the coefficient of x^y!z? in the expansion of (x + y + z)". 
14. Find the coefficient of x^y?z^w in the expansion of (2x + 3y — 42 + м)”. 
15. Show that 


( п; п ) 
пу, N25.. Пк 


by expressing all these multinomial coefficients in terms of factorials and 
simplifying algebraically. 


t t. We, APPLIED- EXERCISES 
o Library 16. There are four routes, A, B, C, and D, between a person’s home and the 
oy place where he works, but route B is one-way so that he cannot take it on 
| Bx £ | {һе way to work, and route C is one-way so that he cannot take it on the 
| ag — way home. 
X» Calcutta b 7 
| СА в Ф 


оле У West Bonga 
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17. 


18. 


19. 


21. 


24. 


(a) Draw a tree diagram showing the various ways he can go to and from 
work. 

(b) Draw a tree diagram showing the various ways he can go to and from 
work, without going the same route both ways. 


A person with $2 in her pocket bets $1, even money, on the flip of a coin, 
and she continues to bet $1 so long as she has any money. Draw a tree 
diagram to show the various things that can happen during the first four flips 
of the coin. In how many of the cases will she be 

(a) exactly even; 

(b) exactly $2 ahead? 


If the NCAA has applications from six universities for hosting its inter- 
collegiate tennis championships in 1994 and 1995, in how many ways can 
they select the hosts for these championships 

(a) if they are not both to be held at the same university; 

(b) if they may both be held at the same university? 

The five finalists in the Miss Universe contest are Miss Argentina, Miss 
Belgium, Miss U.S.A., Miss Japan, and Miss Norway. In how many ways 
can the judges choose 

(a) the winner and the first runner-up; 

(b) the winner, the first runner-up, and the second runner-up? 


. In a primary election, there are four candidates for mayor, five candidates 


for city treasurer, and two candidates for county attorney. 

(a) In how many ways can a voter mark his ballot for all three of these 
offices? 

(b) In how many ways can a person vote if he exercises his option of not 
voting for a candidate for any or all of these offices? 


A multiple-choice test consists of 15 questions, each permitting a choice of 
three alternatives. In how many different ways can a student check off her 
answers to these questions? 


- The price of a European tour includes four stopovers to be selected from 


among ten cities. In how many different ways can one plan such a tour 


(a) if the order of the stopovers matters; 
(b) if the order of the stopovers does not matter? 


. In how many ways can a television director schedule a sponsor's six different 


commercials during the six time slots allocated to commercials during an 
hour "special"? 


In how many ways can the television director of Exercise 23 fill the six time 
slots for commercials 


(a) ifthe sponsor has three different commercials, each of which is to be 
shown twice; 
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(b) if the sponsor has two different commercials, each of which is to be 
shown three times? 


25. In how many ways can five persons line up to get on a bus? In how many 
ways can they line up, if two of the persons refuse to follow each other? 


26. In how many ways can a family of five sit around a dinner table, if it matters 
only who sits next to whom? 


27. How many distinct permutations are there of the letters in the word “statis- 
tics"? How many of these begin and end with the letter s? 


28. A college team plays ten football games during a season. In how many ways 
can it end the season with five wins, four losses, and one tie? 


29. In Example 1.4 we showed that a true-false test which consists of 20 questions 
can be marked in 1,048,576 different ways. In how many ways can the 
questions be marked true or false so that 
(a) 7 are right and 13 are wrong; 

(b) 10 are right and 10 are wrong; 
(c) at least 17 are right? 

30. Among the seven nominees for two vacancies on a city council are three men 
and four women. In how many ways can these vacancies be filled 
(a) with any two of the seven nominees; 

(b) with any two of the four women; 
(c) with one of the men and one of the women? 


31. A shipment of ten television sets includes three that are defective. In how 
many ways can a hotel purchase four of these sets and receive at least two 
of the defective sets? 

32. Ms. Jones has four skirts, seven blouses, and three sweaters. In how many 
ways can she choose two of the skirts, three of the blouses, and one of the 
sweaters to take along on a trip? 

33. How many different bridge hands are possible containing five spades, three 
diamonds, three clubs, and two hearts? 

34. Find the number of ways in which one A, three B's, two C's, and one F can 
be distributed among seven students taking a course in statistics. 


35. If eight persons are having dinner together, in how many different ways can 
three order chicken, four order steak, and one order lobster? 


REFERENCES 
Among the few books on the history of statistics there are 


WALKER, H. M., Studies in the History of Statistical Method. Baltimore: The Williams & 
Wilkins Company, 1929, 


24 


Chap. 1: Introduction 


WESTERGAARD, H., Contributions to the History of Statistics. London: P. S. King & So 
1932, 


and the two more recent publications 

KENDALL, M. G., and PLACKETT, R. L., eds., Studies in the History of Statistics 
Probability, Vol. II. New York: Macmillan Publishing Co., Inc., 1977, 

PEARSON, E. S., and KENDAL, M. G., eds., Studies in the History of Statistics а 
Probability. Darien, Conn.: Hafner Publishing Co., Inc., 1970. 


A wealth of material on combinatorial methods can be found in 

COHEN, D. A., Basic Techniques of Combinatorial Theory. New York: John Wiley & So 
Inc., 1978, 

EISEN, M., Elementary Combinatorial Analysis. New York: Gordon and Breach, Scie 
Publishers, Inc., 1970, 

FELLER, W., An Introduction to Probability Theory and Its Applications, Vol. I, 3rd 
New York: John Wiley & Sons, Inc., 1968, 

NIVEN, J., Mathematics of Choice. New York: Random House, Inc., 1965, 

ROBERTS, F. S., Applied Combinatorics. Englewood Cliffs, N.J.: Prentice-Hall, Inc., 19 


and in 
WHITWORTH, W. A., Choice and Chance, 5th ed. New York: Hafner Publishing Co., I 
1959, 


which has become a classic in this field. More advanced treatments may be found in 
BECKENBACH, E. F., ed., Applied Combinatorial Mathematics. New York: John Wiley: 
Sons, Inc., 1964, [ 
DAVID, F. N., and BARTON, D. E., Combinatorial Chance. New York: Hafner Publi 
Co., Inc., 1962, 


and 
RIORDAN, J., An Introduction to Combinatorial Analysis. New York: John Wiley & Sot 
Inc., 1958. 


Probability 


21 INTRODUCTION 


Historically, the oldest way of measuring probabilities, the classical probability 
concept, applies when all possible outcomes are equally likely, as is presumably 
the case in most games of chance. We can then say that if there are N equally 
likely possibilities, of which one must occur and n are regarded as favorable, or as 
n 


a "success," then the probability of a “success” is given by the ratio N 


EXAMPLE 2.1 
What is the probability of drawing an ace from an ordinary deck of playing cards? 


Solution 


There аге n = 4acesamongthe № = 52 cards, so the probability of drawing 
ап асе is 5. А 


Although equally likely possibilities are found mostly in games of chance, 
the classical probability concept applies also in a great variety of situations where 
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gambling devices are used to make random selections—when office space is 
assigned to teaching assistants by lot, when some of the families in a township 
are chosen in such a way that each one has the same chance of being included 
in a sample study, when machine parts are chosen for inspection so that each 
part produced has the same chance of being selected, and so forth. 

A major shortcoming of the classical probability concept is its limited 
applicability, for there are many situations in which the possibilities that arise 
cannot all be regarded as equally likely. This would be the case, for instance, if 
we are concerned with the question whether it will rain on a given day, if we are 
concerned with the outcome of an election, or if we are concerned with a person's 
recovery from a disease. 

Among the various probability concepts, most widely held is the frequency 
interpretation, according to which the probability of an event (outcome or happen- 
ing) is the proportion of the time that events of the same kind will occur in the long 
run. If we say that the probability is 0.84 that a jet from Los Angeles to San 
Francisco will arrive on time, we mean (in accordance with the frequency 
interpretation) that such flights arrive on time 84 percent of the time. Similarly, 
if the weather bureau predicts that there is a 30 percent chance for rain (namely, 
a probability of 0.30), this means that under the same weather conditions it will 
rain 30 percent of the time. More generally, we say that an event has a probability 
of, say, 0.90, in the same sense in which we might say that our car will start in 
cold weather 90 percent of the time. We cannot guarantee what will happen on 
any particular occasion—the car may start and then it may not—but if we kept 
records over a long period of time, we should find that the proportion of 
"successes" is very close to 0.90.- 

An alternative point of view, which is currently gaining in favor, is to 
interpret probabilities as personal or subjective evaluations. Such probabilities 
express the strength of one's belief with regard to the uncertainties that are 
involved, and they apply especially when there is little or no direct evidence, so 
that there is no choice but to consider collateral (indirect) evidence, “educated 
guesses," and perhaps intuition and other subjective factors. 

The approach to probability we shall use in this chapter is the axiomatic 
approach, in which probabilities are defined as "mathematical objects" which 
behave according to certain well-defined rules. Then, any one of the above 
probability concepts, or interpretations, can be used in applications, so long as 
it is consistent with these rules. 
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Probabilities always pertain to the occurrence or nonoccurrence of events, so let 
us explain formally what we mean by "event" and by the related terms “ехрегі- 
ment" and "sample space." 
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It is customary in statistics to refer to any process of observation or 
measurement as an experiment. In this sense, an experiment may consist of the 
simple process of checking whether a switch is turned on or off; it may consist 
of counting the imperfections in a piece of cloth; or it may consist of the very 
complicated process of determining the mass of an electron. The results one 
obtains from an experiment, whether they are instrument readings, counts, “yes” 
or “по” answers, or values obtained through extensive calculations, are called 
the outcomes of the experiment. 

The set of all possible outcomes of an experiment is called the sample space 
and it is usually denoted by the letter S. Each outcome in a sample space is called 
an element of the sample space or simply a sample point. If a sample space has 
a finite number of elements, we may list the elements in the usual set notation; 
for instance, the sample space for the possible outcomes of one flip of a coin 
may be written 


S = (H, T) 


where H and T stand for head and tail. Sample spaces with a large or infinite 
number of elements are best described by a statement or rule; for example, if 
the possible outcomes of an experiment are the set of automobiles equipped with 
citizen band radios, the sample space may be written 


$ = (x|x is an automobile with a CB radio) 


This is read “S is the set of all x such that x is an automobile with a CB radio." 
Similarly, if S is the set of odd positive integers, we write 


S = {2k + 1k = 0,1,2,...) 


How we formulate the sample space for a given situation will depend on 
the problem at hand. If an experiment consists of one roll of a die and we are 
interested in the number on the face turned up, we would use the sample space 


S, = {1, 2, 3,4, 5, 6} 


If we are interested only in whether the number is even or odd, we would use 
the sample space 


S, = (even, odd) 


This example also illustrates the fact that different sample spaces may be 
used to describe one and the same experiment. Both S, and S; represent outcomes 
for an experiment consisting of one roll of a die. Which one is appropriate 
depends on the problem at hand, but S, , clearly, provides more information than 
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S2. If we know which element in S, occurs, we can tell which element in S, 
occurs; however, if we know which element in S, occurs, we cannot tell which 
element in S, occurs. Generally speaking, it is desirable to use a sample space 
whose elements cannot be “subdivided” into more primitive or more elementary 
kinds of outcomes; that is, an element of a sample space should not represent two 
or more outcomes which are distinguishable in some fashion. 


EXAMPLE 2.2 


Describe the sample space for an experiment in which we roll a pair of dice, one 
red and one green. 


Solution 


The sample space which provides the most information consists of the 36 
points given by 


S, = (x ylx = 1,2,...,6;у = 1,2,..., 6} 


where x represents the number of points rolled with the red die and y 
represents the number of points rolled with the green die. A second sample 
space, adequate for some purposes though generally less desirable as it 
provides less information, might be written as 


$› = (2,3,4,...,12) 


where the 11 elements represent the possible totals rolled with the pair of 
dice. A 


Sample spaces are usually classified according to the number of elements 
which they contain. In the preceding example the sample spaces S, and S; 
contained a finite number of elements, but if a coin is flipped until a head appears 
for the first time, this could happen on the first flip, the second flip, the third 
flip, the fourth flip,..., and there are infinitely many possibilities. For this - 
experiment we obtain the sample space 


S = (H, TH, TTH, TTTH, TTTTH, ...) 


with an unending sequence of elements. But even here the number of elements 
can be matched one-to-one with the whole numbers, and in this sense the sample 
space is said to be countable. If a sample space contains a finite number of 
elements, or an infinite though countable number of elements, it is said to be 
discrete. 
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The outcomes of some experiments are neither finite nor countably infinite. 
Such is the case, for example, when one conducts an investigation to determine 
'the distance that a certain make of car will travel over a prescribed test course 
on 5 liters of gasoline. If we assume that distance is a variable that can be 
measured to any desired degree of accuracy, there is an infinity of possible 
distances that cannot be matched one-to-one with the whole numbers. Also, if 
one wants to record the length of time it takes for two chemicals to react, the 
possible lengths of time making up the sample space are infinite in number and 
not countable. Thus, sample spaces need not be discrete. If a sample space 
contains an infinite number of sample points constituting a continuum, such as 
all the points on a line segment or all the points in a plane, it is said to be continuous. 

Continuous sample spaces arise in practice whenever the outcomes of 
experiments are measurements of physical properties such as temperature, speed, 
pressure, length, ...,that are measured on continuous scales. 
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In many problems we are interested in the occurrence of events that are not given 
directly by a specific element of a sample space. 


EXAMPLE 2.3 


With reference to the sample space S, оп page 27, describe the event A that the 
number of points rolled with a die is divisible by 3. 


Solution 


The number of points rolled will be divisible by 3 if the outcome is 3 or 6; 
namely, if the outcome is an element of the subset A — (3,6) of the sample 
space. A 


EXAMPLE 2.4 


With reference to the sample space S, of Example 2.2, describe the event B that 
the total number of points rolled with a pair of dice is 7. 


Solution 


This will occur if the outcome is an element of the subset 


B = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)} 
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of the sample space S,. Note that in Figure 2.1 the event of rolling a total 
of 7 with a pair of dice is represented by the set of points inside the region 
bounded by the dotted line. A 


Green 
die 


Red die 


Figure 2.1 The event of rolling a total of seven with a pair of dice. 


In the same way, any event can be assigned a collection of sample points, 
which constitute a subset of an appropriate sample space. This subset represents 
all the elements for which the event occurs and, in probability theory, we identify 
the subset with the event. Thus, by definition, an event is a subset of a sample space. 


EXAMPLE 2.5 


If someone shoots at a target three times and we are interested only in whether 
each shot is a hit or a miss, describe the sample space S, the event М that the 
person will miss the target in each of the three shots, and the event H that the 
person will hit the target once and miss it twice. 


Solution 


If we label the outcome of each shot 0 for a miss and 1 for a hit, the eight 
sample points of S might be displayed as the three-dimensional geometric 
configuration of Figure 2.2. Then, the subset M = ((0, 0, 0)) represents the 
event of missing the target in each of the three shots, and the subset 
Н = (1,0, 0), (0, 1, 0), (0,0, 1)) represents the event of hitting the target 
once and missing it twice. Of course, the entire sample space represents the ‘ 
event of getting either 0, 1, 2, or 3 hits in three shots, an event that is се ЧО 
to occur. A 
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Figure 2.2 Sample space for three shots at a target. 


EXAMPLE 2.6 


Construct a sample space for the length of the useful life of a certain electronic 
component and indicate the subset which represents the event F that the com- 
ponent fails before the end of the sixth year. 


Solution 


If t is the length of the component's useful life in years, the sample space 
may be written S = {| > 0}, and the subset F = {1/0 < г < 6) is the event 
that the component fails before the end of the sixth year. A 


According to our definition, any event is a subset of an appropriate sample 
space, but it should be observed that the converse is not necessarily true. For 
discrete sample spaces all subsets are events, but in the continuous case some 
rather abstruse point sets must be excluded for mathematical reasons. This is 
discussed further in some of the more advanced texts listed among the references 
on page 73, but it is of no consequence so far as the work of this book is concerned, 

In many problems of probability we are interested in events which are 
actually combinations of two or more events, formed by taking unions, intersec- 
tions, and complements, Although the reader must surely be familiar with these 
terms, let us review briefly that if A and B are any two subsets of a sample space 
S, their union A ù B is the subset of S which contains all the elements that are 


a 
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either in А, in B, or in both; their intersection A г B is the subset of S which 
contains all the elements that are in both A and B; and the complement A' of 
A is the subset of S which contains all the elements of S that are not in A. The 
various rules which control the formation of unions, intersections, and comple- 
ments are summarized in Appendix I at the end of the book. 

Sample spaces and events, particularly relationships among events, are often 
depicted by means of Venn diagrams, in which the sample space is represented 
by a rectangle, while events are represented by regions within the rectangle, 
usually by circles or parts of circles. For instance, the first diagram of Figure 2.3 
serves to indicate that events A and B are mutually exclusive, namely, that they 
cannot both occur simultaneously. This same concept is conveyed by the second 
diagram of Figure 2.3, where the region representing А г B is shaded darker to 
indicate that it is empty.’ Symbolically, we write A ^ В = Ø when A and B are 
mutually exclusive; since they have no elements in common, their intersection 
equals the empty set Ø. 


Figure2.3 Diagrams showing that events A and B are mutually exclusive. 


APPLIED EXERCISES 


1. If S = {1, 2,3, 4,5, 6,7, 8,9}, A = {1,3,5, 7}, B = {6,7,8,9}, C = {2,4,8}, 
and D = {1,5,9}, list the elements of the subsets of S corresponding to the 
following events: 

(а) A'nB; (b (A'a В) NC; (су Bo ед 
(d (Во С) һр; (е) A'n C; (ff (A'o C)^ D 


2. An electronics firm plans to build a research laboratory in Southern California, 
and its management has to decide between sites in Los Angeles, San Diego, 
Long Beach, Pasadena, Santa Barbara, Anaheim, Santa Monica, and West- 


| * Itis the custom nowadays to refer to all these diagrams as Venn diagrams, although, 
strictly speaking, the term was originally meant to apply only when the circles intersect 
as in the second diagram of Figure 2.3, or as in the diagram of Figure 2.6 on page 47. 
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wood. If A represents the event that they will choose a site in San Diego or 
Santa Barbara, B represents the event that they will choose a site in San 
Diego or Long Beach, C represents the event that they will choose a site in 
Santa Barbara or Anaheim, and D represents the event that they will choose 
a site in Los Angeles or Santa Barbara, list the elements of each of the 
following subsets of the sample space, which consists of the eight site 
selections: 


(a) А; (b р; (ә COD, 
(d Ве С; (е) BUC; (f) AUB; 
(в) "CO D (h (Bo C) (i) BAC’. 


. Among the eight cars which a dealer has in his showroom, Car 1 is new, has 
air-conditioning, power steering, and bucket seats, Car 2 is one year old, has 
air-conditioning, but neither power s. ring nor bucket seats, Car 3 is two 
years old, has air-conditioning and power steering, but no bucket seats, Car 
4 is three years old, has air-conditioning, but neither power steering nor 
bucket seats, Car 5 is new, has no air-conditioning, no power steering, and 
no bucket seats, Car 6 is one year old, has power steering, but neither 
air-conditioning nor bucket seats, Car 7 is two years old, has no air- 
conditioning, no power steering, and no bucket seats, and Car 8 is three years 
old, has no air-conditioning, but power steering as well as bucket seats. Ifa 
customer buys one of these cars, and the event that he chooses a new car, 
for example, is represented by the set [Car 1, Car 5), indicate similarly the 
sets which represent the events that 
(a) he chooses a car without air-conditioning; 

(b) he chooses a car without power steering; 

(c) he chooses a car with bucket seats; 

(d) he chooses a car that is either two or three years old. 

Also state in words what kind of car he will choose, if his choice is given by 
(e) the complement of the set of part (a); 

(f) the union of the sets of parts (b) and (c); 

(g) the intersection of the sets of parts (c) and (d); 

(h) the intersection of the sets of parts (f) and (g). 

. If Ms. Brown buys one of the houses advertised for sale in a Seattle newspaper 
(on a given Sunday), T is the event that the house has three or more baths, 
U is the event that it has a fireplace, V is the event that it costs more than 
$60,000, and W is the event that it is new, describe (in words) each of the 
following events: 


(a) T5 (b) U'* (су Vi 
(d) W5 (е): doo Us (Due Vs 
(p Оп У; (h) Vo W; (i) V o, W; 


Q ToU; () TU V; (D Vow. 
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Б. А resort hotel has two station wagons, which it uses to shuttle its guests to 
and from the airport. If the larger of the two station wagons can carry 5 
passengers and the smaller can carry 4 passengers, the point (0, 3) represents 
the event that at a given moment the larger station wagon is empty while the 
smaller one has 3 passengers, the point (4, 2) represents the event that at the 
given moment the larger station wagon has 4 passengers while the smaller 
one has 2 passengers, . . . , draw a fignre showing the 30 points of the corre- 
sponding sample space. Also, if E stands for the event that at least one of 
the station wagons is empty, F stands for the event that together they carry 
2, 4, or 6 passengers, and G stands for the event that each carries the same 
number of passengers, list the points of the sample space which correspond 
to each of the following events: 


(а) E; (b) Р; (с) G; 
(d EOF; (е) Еп Е; ( FUG; 
(Q EOFS h) Ес С; СЕУ ТЕ, 


6. А coin is tossed once. Then, if it comes up heads, а die is thrown once; if it 
comes up tails, it is tossed twice more. Using the notation in which (H, 2), 
for example, denotes the event that the coin comes up heads and then the 
die comes up 2, and (T, T, T) denotes the event that the coin comes up tails 
three times in a row, list 
(a) the ten elements of the sample space 5; 

(b) theelements of S corresponding to event A that exactly one head occurs; 
(c) the elements of S corresponding to event B that at least two tails occur 
or a number greater than 4 occurs. 


7. An electronic game contains three components arranged in the series-parallel 
circuit shown in Figure 2.4. At any given time, each component may or may 
not be operative, and the game will operate only if there is a continuous 
circuit from P to Q. Let A be the event that the game will operate; let B be 
the event that the game will operate though component x is not operative; 
and let C be the event that the game will operate though component y is 
not operative. Using the notation in which (0, 0, 1), for example, denotes that 
component z is operative but components x and y are not, 

(a) list the elements of the sample space S and also the elements of $ 
corresponding to events A, B, and C; 
(b) determine which pairs of events are mutually exclusive. 


Figure 24 Diagram for Exercise 7. 
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8. An experiment consists of rolling a die until a 3 appears. AE the sample 

space and determine 

(a) how many elemerits of the sample space correspond to the event that 
the 3 appears on the kth roll of the die; 

(b) how many elements of the sample space correspond to the event that 
the 3 appears not later than the kth roll of the die. 

If S = {x|0 < x < 10, М = {x|3 < x < 8, and N = (x|5 < x < 10}, find 

(a) MUN; b MOAN; (с) MON (d) MUN. 

10. Symbolically, describe the sample space S consisting of all points (x, y) on 
or in the interior of a circle of radius 3 centered at the point (2, —3). 


11. In a group of 200 college students 138 are enrolled in a course in psychology, 
115 are enrolled in a course in sociology, and 91 are enrolled in both. How 
many of these students are not enrolled in either course? (Hint: Draw a 
suitable Venn diagram and fill in the numbers associated with the various 
regions.) 
12. A market research organization claims that among 500 shoppers interviewed, 
308 regularly buy Product X, 266 regularly buy Product Y, 103 regularly buy 
both, and 59 buy neither on a regular basis. Using a Venn diagram and filling 
in the number of shoppers associated with the various regions, check whether 
the results of this study should be questioned. 
13. Among 120 visitors to Disneyland, 74 stayed for at least three hours, 86 spent 
at least $8.00, 64 went on the Matterhorn ride, 60 stayed for at least three 
hours and spent at least $8.00, 52 stayed for at least three hours and went 
on the Matterhorn ride, 54 spent at least $8.00 and went on the Matterhorn 
ride, and 48 stayed for at least three hours, spent at least $8.00, and went on 
the Matterhorn ride. Drawing a Venn diagram with three circles (like that 
of Figure 2.6 on page 47) and filling in the numbers associated with the 
various regions, find 
(a) how many of the visitors to Disneyland stayed for at least three hours, 
spent at least $8.00, but did not go on the Matterhorn ride; 

(b) how many of the visitors to Disneyland went on the Matterhorn ride, 
but stayed less than three hours and spent less than $8.00; 

(c) how many of the visitors to Disneyland stayed less than three hours, 
spent at least $8.00, but did not go on the Matterhorn ride. 


24 THE PROBABILITY OF AN EVENT 


To formulate the postulates of probability we shall follow the practice of denoting 
events by means of capital letters, and we shall write the probability of event A 
as P(A), the probability of event B as P(B), and so forth. As before, we shall 
denote the set of all possible outcomes, the sample space, by the letter S. 
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Probabilities are values of a set function, also called а probability measure, 
for as we shall see, this function assigns real numbers to the various subsets of 
a sample space S. As we shall formulate them here, the postulates of probability 
apply only when the sample space $ is discrete. 


POSTULATE 1 The probability of an event is a non-negative real number; 
that is, P(A) > 0 for any subset A of S. 


POSTULATE2 Р($) = 1 


POSTULATE3 If A,, A>, Ax,..., is a finite or infinite sequence of mutually 
exclusive events of S, then 


P(A, U А.о Азу +++) = P(A,) + P(A;) + P(A) +... 


Postulates per se require no proof, but if the resulting theory is to be applied, 
we must show that the postulates are satisfied when we give probabilities a “real” 
meaning. Let us illustrate this here in connection with the frequency interpretation, 
the relationship between the postulates and the classical probability concept will 
be discussed on page 40, while the relationship between the postulates and 
subjective probabilities is left for the reader to examine in Exercises 12 and 17 
on pages 47 and 49. 

Since proportions are always positive or zero, the first postulate is in 
complete agreement with the frequency interpretation. The second postulate states 
indirectly that certainty is identified with a probability of 1—after all, it is always 
assumed that one of the possibilities in S must occur, and it is to this certain 
event that we assign a probability of 1. So far as the frequency interpretation iS | 
concerned, a probability of 1 implies that the event in question will occur 100 
percent of the time, or in other words, that it is certain to occur. | 

Taking the third postulate in the simplest case, namely, for two mutually — 
exclusive events A, and A;, it can easily be seen that it is satisfied by the frequency | 
| 
| 
| 


interpretation. If one event occurs, say, 28 percent of the time, another event | 
occurs 39 percent of the time, and the two events cannot both occur at the same 
time (that is, they are mutually exclusive), then one or the other will occur 
28 + 39 = 67 percent of the time. Thus, the third postulate is satisfied, and the 
same kind of argument applies when there аге more than two mutually exclusive 
events. 

Before we study some of the immediate consequences of the postulates af 
probability, let us emphasize the point that the three postulates do not tell p { 
how to assign probabilities to events, they merely restrict the ways in which р 4 
can be done. 
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EXAMPLE 2.7 


For each of the following, explain why it is not a permissible way of assigning 
probabilities to the four possible and mutually exclusive outcomes A, B, C, and 
D of an experiment: 

(a) P(A) = 0.12, P(B) = 0.63, P(C) = 0.45, P(D) = —0.20; 

(b) P(A) = 730, P(B) = i35, P(C) = tm, P(D) = 15. 


Solution 


In (a) we find that P(D) = —0.20 violates Postulate 1, and in (b) we get 
P(S) = PAUuBuouCuD)- fot H+ 1% + т = 155, which violates 
Postulate 2. A 


Of course, in actual practice probabilities are assigned on the basis of past 
experience, on the basis of a careful analysis of all underlying conditions, ог оп 
the basis of assumptions—sometimes, the assumption that all possible outcomes 
are equiprobable. 

To assign a probability measure to a sample space, it is not necessary tó 
specify the probability of each possible subset. This is fortunate, for a sample 
space with as few as 20 possible outcomes has already 27° = 1,048,576 subsets 
[the general formula follows directly from part (a) of Exercise 8 on page 20], 
and the number of subsets grows very rapidly when there are 50 possible outcomes, 
100 possible outcomes, or more. Instead of listing the probabilities of all possible 
subsets, we often list the probabilities of the individual outcomes, or sample 
points of S, and then make use of the following theorem: 


THEOREM 21 If A is an event in a discrete sample space S, then P(A) 
equals the sum of the probabilities of the individual outcomes com- 
prising A. 


Proój. Let O,, O;, O;,..., be the finite or infinite sequence of out- 
comes which comprise the event A. Thus, 


A-0,00,00,.. 


and since the individual outcomes, the O's, are by definition mutually 
exclusive, the third postulate of probability yields 


P(A) = P(O,) + Р(О,) + Р(О,) +... 


This completes the proof. v 
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For Theorem 2.1 to be useful, we must be able to assign probabilities to. _ 
the individual outcomes of experiments. How this may be done in some special 
situations, is illustrated by the following examples: | 


EXAMPLE 2.8 


If a balanced coin is tossed twice, what is the probability of getting at least one | 
head? 


Solution 


The sample space for this experiment is І 
S = (HH, HT, TH, TT} 


Since the coin is balanced, we assume that each of these outcomes is equally 
likely to occur, and we therefore assign a probability of } to each sample | 
point. If A is the event that we will get at least one head, then A= 
(HH, HT, TH) and 


P(A) = P(HH) + P(HT) + P(TH) 
=1+1+4 
3 


=i A 


EXAMPLE 2.9 


А die is loaded in such a way that each odd number is twice as likely to occur 
às each even number. If E is the event that a number greater than 3 occurs ОП 
a single toss of the die, find P(E). 


Solution 


The sample space is S = {1, 2, 3, 4, 5, 6}. If we assign a »robability of w to 
each even number and a probability of 2w to each od« number, we find 
that2w + w+2w+w+2w+ w= 9w = 1 in accordance with Postulate 
2. Thus, w = } and 


P(E) =3+3+i= 


oa 
> 


If a sample space is discrete but infinite, probabilities will have to be assigned 
to the individual outcomes by means of some mathematical rule. 
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EXAMPLE 2.10 


If O,, O2, O;, ..., represent the infinitely many outcomes of an experiment, verify 
that a permissible probability measure is given by the rule 


P(O,) = G foris 1,2;3;..- 


Solution 


Since the probabilities are all positive, it remains to be shown that P(S) = 1. 
Getting 


BOS) mia taba tetas 22 


and making use of the formula for the sum of the terms in an infinite 
geometric progression, we find that 


P(S) = = =1 a 


s 
2 


To be rigorous in a situation like this, the word “sum” in Theorem 2.1 will 
have to be interpreted so that it includes the value'of an infinite series. 

As we shall see in Chapter 5, the probability measure given in Example 
2.10 would be appropriate if O, represents the event that a person flipping а 
balanced coin will get a head for the first time on the ith try. Thus, the probability 
that the first head will come on the third, fourth, or fifth try is (3)° + (3)* + (5 = 
55, and the probability that the first head will come on an odd-numbered try is 


OHO H = 


where we again made use of the formula for the sum of the terms in an infinite 
geometric progression. 

If an experiment is such that we can assume equal probabilities for the 
sample points of S, as was the case in Example 2.8, we can take advantage of 
the following special case of Theorem 2.1: 


THEOREM 22 If an experiment can result in any one of N different equally 
likely outcomes, and if n of these outcomes together constitute event A, 
then the probability of event A is 


n 


P(A) = N 
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Proof. Let O;, О,,..., Ou represent the individual outcomes in S, 
each with probability x If event A is the uriion of n of these mutually 


exclusive outcomes, and it does not matter which ones, then 


P(A) = Р(О, о O; 0 -+ 0 O,) 
= P(0)) + P(O,) + ·.. + P(O,) 


1 
TIG 
N 


Observe that the formula P(A) = d of Theorem 2.2 is identical with that 


of the classical probability concept, which we gave on page 25. Indeed, what we 
have shown here is that the classical probability concept is consistent with the 
postulates of probability—it follows from the postulates in the special case where 
the individual outcomes are all equiprobable. 


EXAMPLE 2.11 


A 5-card poker hand dealt from a deck of 52 playing cards is said to be a full 
house if it consists of three of а kind and a pair. That is, three of the cards are 
of equal face value and the other two are also of equal face value. Find the 
probability of being dealt a full house. 


Solution 
The number of ways of being dealt a particular full house, say 3 kings and 
. (4\/4 
2 fives, is (9C . Because there are 13 ways to select the face value for 


the three of a kind and, for each of these choices, {һ‹ге are 12 ways to 
select the face value for the pair, the total number of possible full houses 
is given by 
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The total number of 5-card poker hands, all of which are equally likely, is 
52 
N= 
(5) 


Therefore, by Theorem 2.2, the probability of event А of getting a full house 
in a 5-card poker hand is 
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By using the three postulates of probability, we can derive many other rules 
which have important applications. Among the immediate consequences of the 
postulates, we prove the following theorems: 


THEOREM 23 If A and A’ are complementary events іп a sample space $, 
then 


Р(А') = 1 – P(A) 


Proof. In the second and third steps we make use of the definition 
of a complement in Appendix I, according to which A and A' are mutually 
exclusive and А u A’ = S: 


1 = P(S) (by Postulate 2) 
P(A o A’) 


P(A) + P(A’) (Бу Postulate 3) 


Ш 


Therefore, P(A') = 1 — P(A). v 


In connection with the frequency interpretation, this result implies that if an 
event occurs, say, 37 percent of the time, it does not occur 63 percent of the time. 


THEOREM 24 P(@) = 0 for any sample space S. ! 
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Proof. Since S o @ = S and the events S and Ø are mutually exclu- 
sive (see Appendix I at the end of the book), it follows that 


P(S) = P(S o Ø) 
P(S) + P(Ø) (by Postulate 3) 


and, hence, that Р(@) = 0. v | 


It is important to note that it does not follow from P(A) = 0 that A is 
necessarily an empty set. In practice, we often assign a probability of 0 to events 
which, in colloquial terms, would not happen in a million years. For instance, 
there is the classical example that we assign a probability of 0 to the event that 
a monkey set loose on a typewriter will type Plato's Republic word for word 
without a mistake. As we shall see in Chapters 3 and 6, the fact that P(A) = 0 
does not imply A = @ is of relevance, especially, in the continuous case. 


THEOREM 25 If A and B are events in a sample space $ and A c B, then 
P(A) « P(B). 


Proof. Since A c B, we can write | 
В = Ах (А' г B) 


as can easily be verified by means of a Venn diagram. Then, since A and 
A’ ^ B are mutually exclusive, we get 


P(B) = P(A) + P(A' n B) (by Postulate 3) 

z P(A) (by Postulate 1) v 
In words, this theorem states that if event A is a subset of event B, then P(A) 
is not greater than P(B). For instance, the probability of drawing a heart from 


an ordinary deck of 52 playing cards is not greater than the probability of drawing 
a red card, namely, 4 compared to 1. 


| THEOREM 26 0 = P(A) < 1 for any event A. 4 5 
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Proof. Using Theorem 2.5 and the fact that Ø c А с S for any 
event А in S, we have 


P(@) < P(A) « P(S) 
Then, P(@) = 0 and P(S) = 1 leads to the result that 
0 < P(A) <1 Y 
The third postulate of probability is sometimes referred to as a special 


addition rule; it is special in that the events Ai, A2, As, . .., must all be mutually 
exclusive. For two events А and B there exists t! 


he more general addition rule: 


THEOREM 27 If A and B are any two events in a sample space S, then 


P(A о B) = P(A) + P(B) - P(A ^ В) 


Proof. Assigning the probabilities a, b, and c to the mutually exclu- 
sive events А n B, A^ B', and А’ ^ B as in the Venn diagram of Figure 


2.5, we find that 

Р(Ао B)=atbte 
(a+b) + (с+а) -а 
P(A) + P(B) – P(A ^ В) M 


Figure 25 Venn diagram for proof of Theorem 2.7. 
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EXAMPLE 2.12 


If the probabilities are, respectively, 0.86, 0.35, and 0.29 that a family (randomly 
chosen for a sample survey in a large metropolitan area) will own a color television 
set, a black and white set, or both kinds of sets, what is the probability that such 
a family will own either kind of set? 


Solution 


If A is the event that such a family owns a color television set and B is the 
event that it owns a black and white set, we are given P(A) = 0.86, P(B) = 
0.35, P(A ^ B) — 0:29, and substitution into the formula of Theorem 2.7 
yields 


P(A o B) = 0.86 + 0.35 — 0.29 
= 0.92 A 


EXAMPLE 2.13 


If the probabilities are, respectively, 0.23, 0.24, and 0.38 that a car stopped at a 
road block will have faulty brakes, badly worn tires, or faulty brakes and/or 
badly worn tires, what is the probability that such a car will have both faulty 
brakes and badly worn tires? 


Solution 
If B is the event that such a car will have faulty brakes and T is the event 
that it will have badly worn tires, we are given P(B) — 0.23, P(T) = 
0.24, P(B о T) = 0.38, and substitution into the formula of Theorem 2.7 
yields 
0.38 = 0.23 + 024 — P(B ^ T) 


Solving this equation for P(B n T), we get 


P(B ^ T) = 023 + 024 – 038 = 0.09 A 


Repeatedly applying Theorem 2.7, this addition rule can be generalized so 
that it applies to any given number of events; for three events we get 
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THEOREM 23. If A, B, and C are any three events in a sample space S, then 


P(A U Во C) = P(A) + P(B) + P(C) - P(A n B) - P(A a C) 
- P(Bn C) * An Bn C) 


Proof. Writing AU Во C as AU (Во C) and using Theorem 
2.7 twice, once for Р[А о (B о C)] and once for P(B о C), we get 


Ш 


P(A u Bu С) = Р[А о (Bu C)] 
P(A) + P(B о С) – Р[А г (Bu C)] 
P(A) + P(B) + P(C) - P(B с С) 


- P[A ^ (В о С)] 


From the first distributive law (see page 555), it follows that 


P[A ^ (B u C)] = РКА ^ B) о (A^ C)] 
= P(A n B) + P(A n C) - Р[(А n B) ^ (Ao C)] 
Р(А с\ B) + Р(А с C) - PAN Bn C) 


li 


and, hence, that 


P(A u B u C) = P(A) + P(B) + P(C) - P(A г B) - P(A г C) 
- P(Bn C) + P(An Bn C) v 


In Exercise 8 on page 46 the reader will be asked to give an alternative proof of 
Theorem 2.8, based on the method of proof used in the text for the proof of 
Theorem 2.7. 


EXAMPLE 2.14 


Suppose that if a person visits his dentist, the probability that he will have his 
teeth cleaned is 0.44, the probability that he will have a cavity filled is 0.24, the 
probability that he will have a tooth extracted is 0.21, the probability that he will 
have his teeth cleaned and a cavity filled is 0.08, the probability that he will 
have his teeth cleaned and a tooth extracted is 0.11, the probability that he 
will have a cavity filled and a tooth extracted is 0.07, and the probability that 
he will have his teeth cleaned, a cavity filled, and a tooth extracted is 0.03. What 
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is the probability that a person visiting his dentist will have at least one of these 
things done to him? 


Solution 


If C is the event that the person will have his teeth cleaned, F is the event 
that he will have a cavity filled, and E is the event that he will have a 
tooth extracted, we are given P(C) = 0.44, P(F) = 0.24, P(E) = 021, 
P(C ^ F) = 0.08, P(C су E) = 0.11, P(F n E) = 0.07,Р(С с Fr, E) 
= 0.03, and substitution into the formula yields 


P(C o Fu E) = 0.44 + 0.24 + 0.21 — 0.08 — 0.11 — 0.07 + 0.03 
= 0.66 A 


THEORETICAL EXERCISES 


«x 


Refer to parts (c) and (d) of Exercise 3 on page 557 to show that 
(а) P(A) >= P(A n B); 
(b Р(А) < P(A u B). 


. Show that P(A o В’) = P(A) — P(A о B). 
‚ Show that P(A' B’) = 1— P(A) - P(B) + P(A ^ B). 
. The event that “А or B but not both” will occur can be written (A ^ B’) U 


(A' n B). Express the probability of this event in terms of P(A), P( B), and 
P(An B). 


. Use Theorem 2.7 to show that 


(а) P(A ^ B) < Р(А) + P(B); 
(D Р(А г В) > Р(А) + Р(В) – 1. 


. Show that if Р(А) = Р(В) = P(C) = 1, then Р(Ас BOC) - 1. [Hint: 


Assume that P(A ^ B ^ C) + 1 апа show that this leads to a contradiction.] 


- Give an alternative proof of Theorem 2.7 by making use of the relationships 


Au B= Ao (A'n B) and B = (A^ B) o (A'^ B). 


‚ By assigning the probabilities a, b, c, d, e, f, and g as in the Venn diagram of 


Figure 2.6, duplicate the method by which we proved Theorem 2.7 to prove 
Theorem 2.8. 


. Duplicate the method of proof of Exercise 8 to show that P(A u BU CU 


D) = P(A) + P(B) + P(C) + P(D) - P(A о B) - An C) - P(An D) 
-P(BoC)-P(BaoD)-P(Co D)* PÁAn Вт C) An Bn D) 
+Р(А с Сс Р) + P(Bn Co D)- P(An Bo С Р). (Hint: Divide 
each of the eight regions of the Venn diagram of Figure 2.6 into two parts, 
one inside D and one outside D, and assign the resulting regions the 
probabilities а, b, c, . .. , n, o, and p.) 
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Figure 2.6 Diagram for Exercise 8. 


10. 


11. 


12. 


Prove by induction that 
P(E,U Eu ++: u En) < X P(E) 
і=1 


for any finite sequence or events E, Ej, ..., and E, 


The odds that an event will occur are given by the ratio of the probability 
that the event will occur to the probability that it will not occur; they are 
usually quoted in terms of positive integers having no common factor. If the 


a 
odds that an event will occur are a to b, show that its probability is p ern 


Subjective probabilities may be determined by exposing persons to risk-taking 
situations and finding the odds at which they would consider it fair to bet 
on the outcome. The odds are then converted into probabilities by means ort 
the formula of the preceding exercise. For instance, if a person feels that 3 
to 2 are fair odds that a business venture will succeed (or that it would be 


3 
fair to bet $30 against $20 that it will succeed), the probability is amy = 0.6 


that the business venture will succeed. 


(a) Show that if subjective probabilities are determined in this way, they 


satisfy Postulate 1 on page 36. 
(b) Ifa person feels that a to b are fair odds that event A will occur, this 


implies that the odds are b to a that event А will not occur. Thus, 


and it should be observed that the 


P(A) = гр and Р(А) = zu 
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sum of these two probabilities is 1. Under what condition can this 
argument be used to show that subjective probabilities determined in 
this way satisfy Postulate 2 on page 36? 


See also Exercise 17 on page 49. 


APPLIED EXERCISES 


13. 


14. 


15. 


16. 


An experiment has five possible outcomes, A, B, C, D, and F. Check for each 

of the following whether it constitutes a permissible assignment of probability 

and explain your answers: 

(a) P(A) = 0.20, P(B) = 0.20, P(C) = 0.20, P(D) = 0.20, and P(E) = 
0.20; 

(b) P(A) = 0.21, P(B) = 0.26, P(C) = 0.58, P(D) = 0.01, and P(E) = 
0.06; 

(c) P(A) = 0.18, P(B) = 0.19, P(C) = 0.20, P(D) = 0.21, and P(E) = 


0.22; 

(d) P(A) = 0.10, P(B) = 0.30, P(C) = 0.10, P(D) = 0.60, and P(E) = 
—0.10; 

(e) P(A) = 0.23, P(B) = 0.12, P(C) = 0.05, P(D) = 0.50, and P(E) = 
0.08. 

If A and B are mutually exclusive events, P(A) — 0.37 and P(B) — 0.44, find 

(a) P(A); (b) P(B'; (c) P(A о B); 

(d) P(A ^ B); (е) P(A ^ B); (D P(A'n B). 


Explain why there must be a mistake in each of the following statements: 

(a) The probability that Jean will pass the bar examination is 0.66 and the 
probability that she will not pass is —0.34. 

(b) The probability that the home team will win an upcoming football game 
is 0.77, the probability that it will tie the game is 0.08, and the probability 
that it will win or tie the гате is 0.95. 

(c) The probabilities that a secretary will make 0, 1, 2, 3, 4, or 5 or more 
mistakes in typing a report are, respectively, 0.12, 0.25, 0.36, 0.14, 0.09, 
and 0.07. 

(d) The probabilities that a bank will get 0, 1, 2, or 3 or more bad checks 
on any given day are, respectively, 0.08, 0.21, 0.29, and 0.40. 

Supposing that each of the 30 points of the sample space of Exercise 5 on 

page 34 is assigned the probability 35, find the probabilities that at a given 

moment 

(a) at least one of the station wagons is empty; 

(b) each of the two station wagons carries the same number of passengers; 

(c) the larger station wagon carries more passengers than the smaller station 
wagon; 

(d) together they carry at least six passengers. 


17: 


18. 


19. 


21. 
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If subjective probabilities are determined as in Exercise 12, it does not follow 

that Postulate 3 on page 36 must necessarily be satisfied. However, proponents 

of the subjective probability concept generally impose this postulate asa 
consistency criterion; in other words, they regard subjective probabilities 
which do not satisfy Postulate 3 as inconsistent. 

(a) The branch manager of a bank feels that the odds are 7 to 5 against 
her getting a $1,000 bonus and 11 to 1 against her getting a $2,000 
bonus. Furthermore, she feels that it is an even-money bet (the odds 
are 1 to 1) that she will get one or the other. Are the corresponding 
subjective probabilities consistent? 

(b) There are two Porsches in a race, and a reporter feels that the odds 
against their winning are, respectively, 3 to 1 and 4to 1. To be consistent, 
what odds should he assign to the event that either car will win? 


The probabilities that the serviceability of a new X-ray machine will be rated 
very difficult, difficult, average, easy, or very easy are, respectively, 0.12, 0.17, 
0.34, 0.29, and 0.08. Find the probabilities that the serviceability of the 
machine will be rated 

(a) difficult or very difficult; 

(b) neither very difficult nor very easy; 

(c) average or worse; 

(d) average or better. 


A police department needs new tires for its patrol cars and the probabilities 
are 0.15, 0.24, 0.03, 0.28, 0.22, and 0.08 that it will buy Uniroyal tires, Goodyear 
tires, Michelin tires, General tires, Goodrich tires, or Armstrong tires. Find 
the probabilities that it will buy 


(a) Goodyear or Goodrich tires; 

(b) Uniroyal, Michelin, or Goodrich tires; 

(c) Michelin or Armstrong tires; 

(d) Uniroyal, Michelin, General, or Goodrich tires. 

If each card of an ordinary deck of 52 playing cards has the same probability 
of being drawn, what is the probability of drawing 

(a) a red jack; 

(b) a3,4,5,6, 0r 8; 

(c) a red king or a black ace? 

A hat contains twenty white slips of paper numbered from 1 through 20, ten 
red slips of paper numbered from 1 through 10, forty yellow slips of paper 
numbered from 1 through 40, and ten blue slips of paper numbered from 1 
through 10. If these 80 slips of paper are thoroughly shuffled so that each 
slip has the same probability of being drawn, find the probabilities of drawing 
a slip of paper which is 


(a) blue or white; 


48 Chap. 2: Probability 


sum of these two probabilities is 1. Under what condition can this 
argument be used to show that subjective probabilities determined in 
this way satisfy Postulate 2 on page 36? 

See also Exercise 17 on page 49. 


APPLIED EXERCISES 


13. 


14. 


15. 


16. 


An experiment has five possible outcomes, A, B, C, D, and E. Check for each 

ofthe following whether it constitutes a permissible assignment of probability 

and explain your answers: 

(a) P(A) = 0.20, P(B) = 0.20, P(C) = 0.20, P(D) = 0.20, and P(E) = 
0.20; 

(b) P(A) = 0.21, P(B) = 0.26, P(C) = 0.58, P(D) = 0.01, and P(E) = 
0.06; 

(c) P(A) = 0.18, P(B) = 0.19, P(C) = 0.20, P(D) = 0.21, and P(E) = 
0.22; 

(d) P(A) = 0.10, P(B) = 0.30, P(C) = 0.10, P(D) = 0.60, and P(E) = 
—0.10; 

(e) P(A) = 023, P(B) = 0.12, P(C) = 0.05, P(D) = 0.50, and P(E) = 
0.08. 


If A and B are mutually exclusive events, P(A) = 0.37 and P(B) = 0.44, find 
(a) P(A’); (b) P(B’); (с) P(A 0 В); 
(d Р(А о В); (е) P(A г В); (f) P(A'o B). 


Explain why there must be a mistake in each of the following statements: 

(a) The probability that Jean will pass the bar examination is 0.66 and the 
probability that she will not pass is —0.34. 

(b) The probability that the home team will win an upcoming football game 
is 0.77, the probability that it will tie the game is 0.08, and the probability 
that it will win or tie the game is 0.95, 

(c) The probabilities that a secretary will make 0, 1, 2, 3, 4, or 5 or more 


mistakes in typing a report are, respectively, 0.12, 0.25, 0.36, 0.14, 0.09, 
and 0.07. 


(d) The probabilities that a bank will get 0, 1, 2, or 3 or more bad checks 
on any given day are, respectively, 0.08, 0.21, 0.29, and 0.40. 

Supposing that each of the 30 points of the sample space of Exercise 5 on 

page 34 is assigned the probability 35, find the probabilities that at a given 

moment 

(a) atleast one of the station wagons is empty; 

(b) each of the two station wagons carries the same number of passengers; 

(c) the larger station wagon carries more passengers than the smaller station 
wagon; 

(d) together they carry at least six passengers. 


17. 


18. 


19. 


21. 
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If subjective probabilities are determined as in Exercise 12, it does not follow 

that Postulate 3 on page 36 must necessarily be satisfied. However, proponents 

of the subjective probability concept generally impose this postulate as а 

consistency criterion; in other words, they regard subjective probabilities 

which do not satisfy Postulate 3 as inconsistent. 

(a) The branch manager of a bank feels that the odds are 7 to 5 against 
her getting a $1,000 bonus and 11 to 1 against her getting a $2,000 
bonus. Furthermore, she feels that it is an even-money bet (the odds 
are 1 to 1) that she will get one or the other. Are the corresponding 
subjective probabilities consistent? 

(b) There are two Porsches in a race, and a reporter feels that the odds 
against their winning are, respectively, 3 to 1 and 4 to 1. To be consistent, 
what odds should he assign to the event that either car will win? 


The probabilities that the serviceability of a new X-ray machine will be rated 
very difficult, difficult, average, easy, or very easy are, respectively, 0.12, 0.17, 
0.34, 0.29, and 0.08. Find the probabilities that the serviceability of the 
machine will be rated 

(a) difficult or very difficult; 

(b) neither very difficult nor very easy; 

(c) average or worse; 

(d) average or better. 

A police department needs new tires for its patrol cars and the probabilities 
are 0.15, 0.24, 0.03, 0.28, 0.22, and 0.08 that it will buy Uniroyal tires, Goodyear 
tires, Michelin tires, General tires, Goodrich tires, or Armstrong tires. Find 
the probabilities that it will buy 

(a) Goodyear or Goodrich tires; 

(b) Uniroyal, Michelin, or Goodrich tires; 

(c) Michelin or Armstrong tires; 

(d) Uniroyal, Michelin, General, or Goodrich tires. 

If each card of an ordinary deck of 52 playing cards has the same probability 
of being drawn, what is the probability of drawing 

(a) a red jack; 

(b) a3, 4, 5, 6, or 8; 

(c) a red king or a black ace? 

A hat contains twenty white slips of paper numbered from 1 through 20, ten 
red slips of paper numbered from 1 through 10, forty yellow slips of paper 
numbered from 1 through 40, and ten blue slips of paper numbered from 1 
through 10. If these 80 slips of paper are thoroughly shuffled so that each 
slip has the same probability of being drawn, find the probabilities of drawing 
a slip of paper which is 


(a) blue or white; 
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(b) numbered 1, 2, 3, 4, or 5; 

(c) red or yellow and numbered 1, 2, 3, or 4;. 

(d) numbered 5, 15, 25, or 35; 

(e) white and numbered higher than 12 or yellow and numbered higher 
than 26. 


. Four candidates are seeking a vacancy on a school board. If A is twice as 


24. 


likely to be elected as B, and B and C are given about the same chance of 
being elected, while C is twice as likely to be elected as D, what are the 
probabilities that 


(a) C wins; 
(b) A does not win? 
Two cards are randomly selected from a deck of 52 playing cards. Use 


Theorem 1.7 to find the probability that both cards are greater than 3 and 
less than 8. 

In a poker game where 5 cards are dealt at random from a deck of 52 playing 
cards, find the probability of getting 


(a) two pairs (any two distinct face values occurring exactly twice); 
(b) four of a kind (four cards of equal face value). 


. In a game of “Yahtzee,” where five dice are tossed simultaneously, find the 


26. 


probabilities of getting 

(a) two pairs; 

(b) three of a kind; 

(c) a full house (three of a kind and a pair); 
(d) four of a kind. 


Among the 78 doctors on the staff of a hospital, 64 carry malpractice insurance, 
36 are surgeons, and 34 of the surgeons carry malpractice insurance. If one 
of these doctors is chosen by lot to represent the hfspital staff at an A.M.A. 
convention (that is, each doctor has a probability of 7g of being selected), 
what is the probability that the one chosen is not a surgeon and does not 
carry malpractice insurance? 


. Refer to Exercises 1 and 5 to explain why there must be a mistake in each 


of the following statements: 

(a) The probability that it will rain is 0.67 and the probability that it will 
rain or snow is 0,55. 

(b) The probability that a student will get a passing grade in English is 0.82 
and the probability that she will get a passing grade in English and 
French is 0.86. 

(c) The probability that a person visiting the San Diego Zoo will see the 
giraffes is 0.72, the probability that he will see the bears is 0.84, and the 
probability that he will see both is 0.52. 


31. 
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. Given P(A) = 0.59, P(B) = 0.30, and P(A г B) = 0.21, find 


(a) P(A U В); (b) P(A n В); 
(c) Р(А' о B’); (d) P(A'n B). 


. For married couples living in a certain suburb, the probability that the 


husband will vote in a school board election is 0.21, the probability that his 
wife will vote in the election is 0.28, and the probability that they will both 
vote is 0.15. What is the probability that at least one of them will vote? 


. A biology professor has two graduate assistants helping him with his research. 


The probability that the older of the two assistants will be absent on any 
given day is 0.08, the probability that the younger of the two will be absent 
on any given day is 0.05, and the probability that they will both be absent 
on any given day is 0.02. Find the probabilities that 

(a) either or both of the graduate assistants will be absent on any given day; 
(b) at least one of the two graduate assistants will not be absent on any 

given day; 

(c) only one of the two graduate assistants will be absent on any given day. 
At Roanoke College it is known that 3 of the students live off campus. It is 
also known that $ of the students are from within the state of Virginia and 
that å of the students are from out-of-state or live in the dormitories. What 
is the probability that a student selected at random from Roanoke College 
is from out-of-state and lives on campus? 


. Suppose that if a person visits Disneyland, the probability that he will go on 


the Jungle Cruise is 0.74, the probability that he will ride the Monorail is 
0.70, the probability that he will go on the Matterhorn ride is 0.62, the 
probability that he will go on the Jungle Cruise and ride the Monorail is 
0.52, the probability that he will go on the Jungle Cruise as well as the 
Matterhorn ride is 0.46, the probability that he will ride the Monorail and 
go on the Matterhorn ride is 0.44, and the probability that he will go on all 
three of these rides is 0.34. What is the probability that a person visiting 
Disneyland will go on at least one of these three rides? 


. Suppose that if a person travels to Europe for the first time, the probability 


that he will see London is 0.70, the probability that he will see Paris is 0.64, 
the probability that he will see Rome is 0.58, the probability that he will see 
Amsterdam is 0.58, the probability that he will see London and Paris is 0.45, 
the probability that he will see London and Rome is 0.42, the probability 
that he will see London and Amsterdam is 0.41, the probability that he will 
see Paris and Rome is 035, the probability that he will see Paris and 
Amsterdam is 0.39, the probability that he will see Rome and Amsterdam is 
0.32, the probability that he will see London, Paris, and Rome is 0.23, the 
probability that he will see London, Paris, and Amsterdam is 0.26, the 
probability that he will see London, Rome, and Amsterdam is 0.21, 
the probability that he will see Paris, Rome, and Amsterdam is 0.20, 


52 Chap. 2: Probability 


and the probability that he will see all four of these cities is 0.12. What is 
the probability that a person traveling to Europe for the first time will see at 
least one of these four cities? (Hint: Use the formula of Exercise 9.) 


2.6 CONDITIONAL PROBABILITY 


Difficulties can easily arise when probabilities are quoted without specification 
of the sample space. For instance, if we ask for the probability that a lawyer 
makes more than $50,000 per year, we may well get several different answers, 
and they may all be correct. One of them might apply to all law school graduates, 
another might apply to all persons licensed to practice law, a third might apply 
to all those who are actively engaged in the practice of law, and so forth. Since 
the choice of the sample space (namely, the set of all possibilities under consider- 
ation) is by no means always self-evident, it often helps to use the symbol P(A|S) 
to denote the conditional probability of event A relative to the sample space S, 
or as we also call it “the probability of A given S.” The symbol P(A|S) makes 
it explicit that we are referring to a particular sample space S, and it is preferable 
to the abbreviated notation P(A) unless the tacit choice of S is clearly understood. 
It is also preferable when we want to refer to several sample spaces in the same 
example. If A is the event that a person makes more than $50,000 per year, G 
is the event that a person is a law school graduate, L is the event that a person 
is licensed to practice law, and E is the event that a person is actively engaged 
in the practice of law, then P(A|G) is the probability that a law school graduate 
makes more than $50,000 per year, P(A|L) is the probability that a person licensed 
to practice law makes more than $50,000 per year, and P(A|E) is the probability 
that a person actively engaged in the practice of law makes more than $50,000 
per year. 

Some ideas connected with conditional probabilities are illustrated in the 
following example: 


EXAMPLE 2.15 


A consumer research organization has studied the services under warranty pro- 
vided by the 50 new car dealers in a certain city, and its findings are summarized 
in the following table: 


Good service Poor service 
under warranty under warranty 
In business ten years or more 4 


In business less than ten years 20 
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If a person randomly selects one of these new car dealers, what is the probability 

‘that he gets one who provides good service under warranty? Also, if a person 
randomly selects one of the dealers who has been in business for ten years or 
more, what is the probability that he gets one who provides good service under 
warranty? 


Solution 


By “randomly” we mean that, in each case, all possible selections are equally 
likely, and we can therefore use the formula of Theorem 2.2. If we let G 
denote the selection of a dealer who provides good service under warranty, 
and if we let n(G) denote the number of elements in G, and n(S) the 
number of elements in the whole sample space, we get 


_ n(G) _ 16 +10 _ 
P(G) = Sa aN cag TU 0.52 


This answers the first question. 
For the second question, we limit ourselves to the reduced sample 
space which consists of the first line of the table, namely, the 16 + 4 = 20 


dealers who have been in business ten years or more. Of these, 16 provide 
good service under warranty, and we get 


where T denotes the selection of a dealer who has been in business ten 
years or more. This answers the second question, and as should have been 
expected, P(G|T) is considerably higher than P(G). А 


Since the numerator of P(G|T) is n(T ^ С) = 16inthe preceding example, 
the number of dealers who have been in business for ten years or more and 
provide good service under warranty, and the denominator is n( T), the number 
of dealers who have been in business ten years or more, we can write symbolically 


n(T ^ G) 


P(G|T) = T) 


Then, if we divide the numerator and the denominator by n(S), the total number 
of new car dealers in the given city, we get 


n(T с G) 
mS) | P(T OG) 
P(GIT), = nT) а PPT) 


n(S) 
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and we have, thus, expressed the conditional probability P(G|T) in terms of two 
probabilities defined for the whole sample space S. 

Generalizing from the above, let us now make the following definition of 
conditional probability: 


DEFINITION 2.1. If A and B are any two events in a sample space S and 
P(A) # 0, the conditional probability of B given A is 


P(A n B) 
P(A) 


P(B|A) = 


EXAMPLE 2.16 


With reference to Example 2.15, what is the probability that one of the dealers 
who has been in business less than ten years will provide good service under 


warranty? 
Solution 
* ; 
Since P(T' n С) = 5 = 0.20 and P(T') = a0 50 20 = 0.60, substitution 


into the formula yields 


P(IUmG),.020.. 1 


Although we justified the formula of Definition 2.1 with an example in 
which the possibilities were all equally likely, this is not a requirement for its use. 


EXAMPLE 2.17 


With reference to the loaded die of Example 2.9, what is the probability that the 
number of points rolled is a perfect square? Also, what is the probability that it 
is a perfect square given that it is greater than 3? 


Solution 


If A is the event that the number of points rolled is greater than 3 and B 
is the event that it is a perfect square, we have A — (4,5,6), B — (1, 4), 
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and A ^ B = (4). Since the probabilities of rolling a 1, 2, 3, 4, 5, or 6 with 


the die are 2,1,2,1,2, and $ (see page 38), we find that the answer to the 


first question is 
P(B) =3+5=5 
To determine P(B|A), we first calculate 
P(An В) =5 and P(A) =3+3+3=5 
Then, substituting into the formula of Definition 2.1, we get 


Р(А с В) _ 
PAY -— 


P(B|A) = 


Originally, we justified Definition 2. with an example in which the 
possibilities were all equally likely. To show that it yields the "right" answer 
here, where the possibilities are not all equally likely, we need only to assign a 
probability of v to the two even numbers and a probability of 2v to the odd 
number in the reduced sample space A, such that the sum of the three probabilities 
is 1. We then have v + 20 + v = 1010 = 1, and hence P(B|A) = 4 as before. 


EXAMPLE 2.18 


A manufacturer of airplane parts knows from past experience that the probability 
is 0.80 that an order will be ready for shipment on time, and it is 0.72 that an 
order will be ready for shipment on time and will also be delivered on time. 
What is the probability that such an order will be delivered on time given that 
it was ready for shipment on time? 


Solution 
If we let R stand for the event that an order is ready for shipment on time 
and D for the event that it is delivered on time, we have P(R) = 0.80 and 
P(RoD)- 0.72; and it follows that 

P(RoD) 0.7 


2 
= = 0.90 
Р(Е) 0.80 


P(D|R) = 


Thus, 90 percent of the shipment will be delivered on time provided they 
are shipped on time. Note that P(R|D), the probability that a shipment 
which is delivered on time was also ready for shipment on time, cannot be 
determined without further information; for this purpose we would also 
have to know P(D). A 


56 


EXAMPLE 2.19 


EXAMPLE 2.20 


Chap. 2: Probability 


Multiplying the expressions on both sides of the formula of Definition 2.1 


by P(A), we obtain the following multiplication rule: 


THEOREM 2.9 If A and B are any two events in a sample space S and 
P(A) # 0, then 


P(A ^ В) = P(A) - P(B|A) 


In words, the probability that A and B will both occur is the product of the 
probability of A and the conditional probability of B given A. Alternatively, if 
P(B) # 0, itis the product of the probability of B and the conditional probability 
of A given B; symbolically, P(A ^ B) = P(B) - P(A|B). To derive this alterna- 
tive form, we interchange A and B in the formula of Theorem 2.9 and make use 
of the fact that Ас B= Вг A. 


If we randomly pick two television tubes in succession from a shipment of 240 
television tubes of which 15 are defective, what is the probability that they will 
both be defective? 


Solution 


If we assume equal probabilities for each selection (which is what we mean 
by "randomly" picking the tubes), the probability that the first tube will be 
defective is #5, and the probability that the second tube will be defective 
given that the first tube is defective is 25. Thus, the probability that both 
tubes will be defective is 35; - 35 = zu. This assumes that we are sampling 
without replacement, namely, that the first tube is not replaced before the 
second tube is picked. A 


Find the probability of randomly drawing two aces in succession from an ordinary 
deck of 52 playing cards (a) if we sample without replacement, and (b) if we 
sample with replacement. 
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Solution 


(a) If the first card is not replaced before the second card is drawn, the 
probability of getting two aces in succession is 


(b) If the first card is replaced before the second card is drawn, the 
corresponding probability is 


In the situations described in the two preceding examples there is a definite 
temporal order between the two events А and B. In general, this need not be the 
case when we write P(A|B) or P(B|A). For instance, we could ask for the 
probability that the first card drawn was an ace given that the second card drawn 
(without replacement) is an ace—the answer would also be з. 

Theorem 2.9 can easily be generalized so that it applies to more than two 
events; for instance, for three events we have: 


THEOREM 210 If A, B, and C аге any three events in a sample space S, 
such that P(A) # 0 and P(A ^ B) # 0, then 


P(A ^ BAC) = P(A) · Р(В|А) Р(С|А o B) 


Proof. Writing An B o^ Cas(An B) ^ С and using the formula 
of Theorem 2.9 twice, we get 


P(An BOC) = РКА о B) o C] 
P(A n В). P(C|A с B) 


P(A): P(BJA): P(C|An B) v 


Further generalization of Theorems 2.9 and 2.10 to k events is now straightfor- 
ward, and the resulting formula can be proved by mathematical induction. 
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EXAMPLE 2.21 
А box of fuses contains 20 fuses, of which 5 are defective. If 3 of the fuses are 
selected at random and removed from the box in succession without replacement, 
what is the probability that all three fuses are defective? 
Solution 
If A is the event that the first fuse is defective, B is the event that the second 
fuse is defective, and C is the event that the third fuse is defective, then 
P(A) = $,P(B|A) = $, P(C|A ^ B) = ў, and substitution into: the 
formula yields 
PAN Во С) = 5:6: й 
= th A 
2.7 INDEPENDENT EVENTS 


Informally speaking, two events A and B are said to be independent if the 
occurrence or nonoccurrence of either of them does not affect the probability of 
the occurrence of the other. For instance, if in Example 2.21 each fuse is replaced 
before the next one is randomly drawn, the outcomes of successive selections 
are all independent—the probability of getting a defective fuse remains 3; in each 
case. 

Symbolically, two events A and B are independent if P(B|A) = P(B) and 
P(A|B) — P(A), and it can be shown that either of these equalities implies the 
other when both of the conditional probabilities exist, namely, when neither 
P(A) nor P(B) equals zero (see Exercise 6 on page 67). 

Now, if we substitute P(B) for P(B|A) into the formula of Theorem 2.9, 
we get 


P(A ^ B) 


ll 


P(A) - P(B|A) 
P(A) - P(B) 


and we shall use this as our formal definition of independence. 


DEFINITION 22 Two events A and B are independent if and only if 


P(A ^ B) = P(A) · P(B) 
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If two events are not independent, they are said to be dependent. In the derivation 
of the formula of Definition 2.2 we assumed that P(B|A) exists and, hence, that 
P(A) # 0. For mathematical convenience, we shall let the definition apply also 
when P(A) = 0 and/or P(B) = 0. 


EXAMPLE 2.22 


A coin is tossed three times and the eight possible outcomes, HHH, HHT, HTH, 
THH, HTT, THT, TTH, and ТТТ, are assumed to be equally likely. If A is the 
event that a head occurs on each of the first two tosses, B is the event that a tail 
occurs on the third toss, and C is the event that exactly two tails occur in the 
three tosses, show that events A and B are independent whereas B and C are 
dependent. 


Solution 


Since 


A = {HHH, HHT} 
B = {HHT, HTT, THT, TIT} 
C = (HTT, THT, TTH) 
An B - (HHT) 
В а С = {HTT, THT} 


the assumption that the eight possible outcomes are all equiprobable yields 
P(A) =}, P(B) =, P(C) = 1, P(A n B) = 1, and P(Bo C) = N 
Then, since P(A) · P(B) = 1.1 = 1 equals P(A г B), the events A and 
В are independent, and since P(B) + P(C) = 1:i- ў does not equal 


P(B ^ C), the events В and C are dependent. A 


With regard to Definition 2.2 it can be shown that either, or both, events can be 
replaced by their complements. For instance, 


THEOREM 2.11. If the two events A and B are independent, then the two 


events A and B' are also independent. 
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Proof. Since A = (А т В) О (An B)andA ^ B and Ar) B'are 
mutually exclusive, we have 


P(A) 


P[(A ^ B) v (An B')] 
= P(A n B) + P(A n В) 


ог, since А and B are independent, 
P(A) = P(A) - P(B) + P(A г В”) 
It follows that 


P(A ^ В) = P(A) - [1 — P(B)] 
P(A) - P(B') 


ї 


and, hence, that A апа В' are independent. In Exercise 5 on page 67 the 
reader will be asked to show that if A and B are independent, then A' and 
B, and A' and B', are also independent. v 


To extend the concept of independence to more than two events, let us 
make the following definition: 


DEFINITION 23 Events A,, A;,... „апа A, are independent if and only if 


the probability of the intersection of any 2, 3,..., or k of these events equals 
the product of their respective probabilities. 


For three events A, B, and C, for example, independence requires that P(A ^ 
В) = P(A) - P(B), P(A n С) = P(A)- P(C), P(B ^ C) = P(B)- P(C), 
and PAG Br C) = P(A) + P(B). P(C). 

It is of interest to note that three ог more events can be pairwise independent 
without being independent. 


EXAMPLE 2.23 


Consider three events A, B, and C ina sample space S, with probabilities assigned 
às in the Venn diagram of Figure 2.7. Show that A and B are independent, A 
and C are independent, B and C are independent, but A, B, and C are not 
independent. 
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Solution 


As can be seen from the diagram, P(A) = P(B) = P(C) = }, PIA ^ B) = 
P(A n C) = P(B ^ C) = 4, and P(A n Bo C) = 1. Thus, P(A) Р(В) = 

| | = Р(А г В), P(A): P(C) = | = Р(Аг C), P(B) Р(С) == PIBA С), 
but P(A): P(B) P(C) = | * P(A ^ B ^^ C), and this proves what we 
set out to show. A 


Incidentally, the preceding example can be given а "real" interpretation 
by considering a large room which has three separate switches controlling the 
ceiling lights. These lights will be on when all three switches are “up,” and hence 
also when one of the switches is "up" and the other two are "down." If A is the 
event that the first switch is "up," B is the event that the second switch is "up," 
and С is the event that the third switch is "up," the Venn diagram of Figure 2.7 
shows a possible set of probabilities associated with the switches being "up" or 
"down" when the ceiling lights are on. 

It can also happen that P(A ^ B ^ C) = P(A) : РОВ) PCC) without 
A, B, and C being pairwise independent—this the reader will be asked to verify 
in part (a) of Exercise 7 on page 67. 

Of course, if certain events are given as independent, the probability that 
they will all occur is simply the product of their respective probabilities. 


| 
| 
Figure 27 Venn diagram for Example 2.23. 


EXAMPLE 2.24 


Find the probability of getting three heads in three (independent) tosses of a 
balanced coin, and also the probability of first rolling four fives and then another 
number in five (independent) rolls of a fair die. 
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Solution 


Multiplying the respective probabilities, we get 


for the probability of first rolling four fives and then another number. A 


28 BAYES' THEOREM 


There are many problems in which the ultimate outcome of an experiment depends 
on what happens in various intermediate stages. 


EXAMPLE 2.25 


In the simplest case there is one intermediate stage consisting of two alternatives. 
Suppose, for instance, that we are concerned with the completion of a highway 
construction job, which may be delayed because of a strike. Suppose, furthermore, 
that the probabilities are 0.60 that there will be a strike, 0.85 that the job will be 
completed on time if there is no strike, and 0.35 that the job will be completed 
on time if there is a strike. What is the probability that the job will be completed 
on time? 


Solution 
If A is the event that the job will be completed on time and B is the event 
that there will be a strike, the given information can be written as P(B) = 
0.60, P(A|B) = 0.35, and P(A|B") = 0.85. Since A is the union of the two 
mutually exclusive events A ^ Band Aq B' (see Figure 2.8), we can write 
P(A) = P[(An B) (ån В')] 
= Р(Ас В) + P(A n В) 
= P(B) - P(A|B) + Р(В') - P(A|B)) 
Then, substitution of the given numerical values yields 
P(A) = (0.60)(0.35) + (1 — 0.60)(0.85) 
= 0.55 A 
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Figure 2.8 Venn diagram for Example 2.25. 


An immediate generalization of this kind of problem to the case where the 
intermediate stage permits k alternatives (whose occurrence is denoted 
B,, B;,..., Вк) is taken care of by the following theorem, sometimes called the 
rule of elimination: 


THEOREM 2.12. If the events B,, B;,..., and B, constitute a partition of 
the sample space S and P(B,) # 0 for i = 1,2,..., k, then for any event 
А іп 5 


‚ P(A) = X p(B): Р(А|В) 


As was defined in the footnote to page 11, the B's constitute a partition of the 
sample space if they are pairwise mutually exclusive and if their union equals S. 
A formal proof of Theorem 2.12 consists, essentially, of the same steps which 
we used in Example 2.25, and it will be left to the reader in Exercise 11 on page 


68. 


EXAMPLE 2.26 


The members of a consulting firm rent cars from three rental agencies: 60 percent 
from agency 1, 30 percent from agency 2, and 10 percent from agency 3. If 9 
percent of the cars from agency 1 need a tune-up, 20 percent of the cars from 
agency 2 need a tune-up, and 6 percent of the cars from agency 3 need a tune-up, 
what is the probability that a rental car delivered to the firm will need a tune-up? 
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Solution 


If A is the event that the car needs a tune-up, and B,, B;, and B, are, 
respectively, the events that the car comes from rental agencies 1, 2, or 3, 
we have P(B,) = 0.60, Р(В,) = 0.30, P(B;) = 0.10, P(A|B,) = 0.09, 
P(A|B;) = 0.20, and P(A|B;) — 0.06. Substitution of these values into the 
formula of Theorem 2.12 yields 


P(A) = (0.60)(0.09) + (0.30)(0.20) + (0.10)(0.06) 
= 0.12 


Thus, 12 percent of all the rental cars delivered to this firm need a tune- 
up. A 


With reference to the preceding example, suppose that we are interested in 


the following question: If a rental car delivered to the consulting firm needs a 
tune-up, what is the probability that it came from rental agency 2? To answer 
questions of this kind, we need the following theorem, called Bayes’ theorem: 


THEOREM 213 If the events B,, B,,...,and B, constitute a partition of 
the sample space 5 and P(B,) = 0 for i = 1, 2, ..., К, then for any event 
А in S such that P(A) = 0 


B). P 
p(B JA) = P9 - РСАЇВ) 


X P(B)- Р(А|В,) 


forr = 1,2,..., К 


In words, the probability that event А was reached via the rth branch of the tree 
diagram of Figure 2.9, given that it was reached via one of its k branches, is the 
ratio of the probability associated with the rth branch to the sum of the prob- 
abilities associated with all k branches of the tree. 


P(A с B,) 
P(A) 
definition of conditional probability, we have only to substitute 
Р(В,) - P(A|B,) for P(A ^ B,) and the expression in the formula of 

Theorem 2.12 for P(A). v 


Proof. Writing P(B,|A) = in accordance with the 
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P(AIB4) A 


PB): Р(А\В{) 


PIAIB2) 
P(B3): P(AIB2) 


P(AIB,) A 


P(B,Y: PIAIBIO 


Figure 29 Tree diagram for Bayes' theorem. 


EXAMPLE 2.27 


With reference to Example 2.26, if a rental car delivered to the consulting firm 
needs a tune-up, what is the probability that it came from rental agency 21 


Solution 
Substituting the probabilities on page 64 into the formula of Theorem 2:13; 
we get 
ET (0.30)(0.20) 
Р(ВЈА) = (5.69)(0.09) + (0.30)(0.20) + (0.10)(0.06) 
_ 0.060 
~ 0.120 
= 0.5 


Observe that although only 30 percent of the cars delivered to the firm come 
from agency 2, 50 percent of those requiring a tune-up come from that 


agency. А 
\ 


EXAMPLE 2.28 


In a certain state, 25 percent of all cars emit excessive amounts of pollutants. If 
the probability is 0.99 that a car emitting excessive amounts of pollutants will 


fail the state’s vehicular emission test, and the probability is 0.17 that a car not 
emitting excessive amounts of pollutants will nevertheless fail the test, what is 
the probability that a car which fails the test actually emits excessive amounts 


of pollutants? 
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Solution 


Picturing this situation as in Figure 2.10, we find that the probabilities 
associated with the two branches of the tree diagram are (0.25)(0.99) = 
0.2475 and (1 — 0.25)(0.17) = 0.1275. Thus, the probability that ас r which 
fails the test actually emits excessive amounts of pollutants is 


0.2475 


02475 + 0.1275 ~ 0.96 


Of course, this result could also have been obtained without the diagram, 
by substituting directly into the formula of Bayes' theorem. A 


B 0.99 


а (0.25)(0.99) = 0.2475 


А 


(0.75) (0.17) = 0.1275 


Figure 2.10 Tree diagram for Example 2.28. 


Although Bayes' theorem follows from the postulates of probability and 
the definition of conditional probability, it has been the subject of extensive 
controversy. There can be no question about the validity of Bayes' theorem, but 
considerable arguments have been raised about the interpretation of the prior 
probabilities P(B,). Also, a good deal of mysticism surrounding Bayes' theorem 
is due to the fact that it entails a “backward” or “inverse” sort of reasoning, 
namely, reasoning "from effect to cause," as in Example 2.28. 


THEORETICAL EXERCISES 


1. Show that the three postulates of probability are satisfied by conditional 
probabilities; that is, show that with P(B) > 0, 
(a) P(A|B) 7 0; 
(b Р(В|В) = 1; 
(с) P(A, U Ai u ++-|B) = P(A,|B) + P(A,|B) + ---for any sequence 
of mutually exclusive events А,, A;,.... 
2. Show by means of numerical examples that P(B|A) + P(B|A") 
(a) may equal 1; 
(b) need not equal 1. 
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3. Duplicating the method of proof of Theorem 2.10, show that P(A ^ B ^ 
C ^ D) = P(A)- Р(В|А) · P(C|A ^ B): P(D|A^ Ba С) provided 
Р(А с\ Bo С) # 0. 

4. Given three events А, B, and C such that P(A ^ B ^ C) # Oand P(C [А ^ 
B) = P(C|B), show that P(A|B ^ C) = P(A|B). 

5. Show that if the two events A and B are independent, then 
(a) the two events A’ and B are also independent; 

(b) the two events А' and B' are also independent. 
6. Show that if P(B|A) = P(B) and P(B) # 0, then P(A|B) = P(A). 
7. Refer to Figure 2.11 to show that 
(a) Р(Ас BoC) = P(A): P(B): P(C) does not necessarily imply that 
the events A, B, and C are all pairwise independent; 

(b) if A is independent of B and A is independent of C, then B is not 
necessarily independent of C; 

(c) if A is independent of B and A is independent of C, then A is not 
necessarily independent of B о C. 


Figure 2.11 Diagram for Exercise 7. 


8. If the three events, A, B, and C are independent, show that 
(a) Aand Bo C are independent; 
(b) Aand Во C are independent; 
(c) A'and B o C' are independent. 
9. Show that 2 – k 1 conditions must be satisfied for k events to be 
independent. 
10. For any event А, show that A and are independent. 
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11. Prove Theorem 2.12, making use of the following generalization of the first 
distributive law given in Appendix I on page 555: 


An(BuouB;o:::oB)-(AnB)u(An В,) о ::: (An B) 


APPLIED EXERCISES 


12. There are 90 applicants for a job with the news department of a television 
station. Some of them are college graduates and some are not, some of them 
have at least three years' experience and some have not, with the exact 
breakdown being 


College Not college 

graduates graduates 
At least three years' experience 18 9 
Less than three years' experience 36 27 


If the order in which the applicants are interviewed by the station manager 
is random, G is the event that the first applicant interviewed is a college 
graduate, and T is the event that the first applicant interviewed has at least 
three years' experience, determine each of the following probabilities directly 
from the entries and the row and column totals of the table: 


(a) P(G) (b) P(T); (с) P(G o T) 
(d) P(G'o T); (е) P(T|G); (f) P(G|T). 
Use these results to verify that 

_ R(Ga Т). РС) 
веја = Se OGD = — -у- 


13. With reference to Exercise 26 on page 50, what is the probability that the 
doctor chosen to represent the hospital staff at the convention carries malprac- 
tice insurance given that he or she is a surgeon? 


14. With reference to Exercise 29 on page 51, what is the probability that à 
husband will vote in the given school board election given that his wife will 
vote? 


15. With reference to Exercise 31 on page 51, what is the probability that one 
of the students lives in a dormitory given that he or she is from out-of-state? 
16. Basketball teams from Universities A, B, C, and D are quoted as having 
probabilities of 0.2, 0.4, 0.3, and 0.1, respectively, of winning the playoff for 
the national championship. If University B is placed on probabation and 


17. 


18. 


19. 


20. 


21. 
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declared ineligible to participate (and it is not replaced by another university), 
what is the probability that University А will win the national championship? 


With reference to Exercise 32 on page 51, find the probabilities that a person 

who visits Disneyland will 

(a) ride the Monorail given that he will go on the Jungle Cruise; 

(b) go on the Matterhorn ride given that he will go on the Jungle Cruise 
and ride the Monorail; 

(c) not go on the Jungle Cruise given that he will ride the Monorail and/or 
go on the Matterhorn ride; 

(d) go on the Matterhorn ride and the Jungle Cruise given that he will not 
ride the Monorail. 

(Hint: Draw a Venn diagram and fill in the various probabilities.) 


The probability of surviving a certain transplant operation is 0.5. If a patient 
survives the operation, the probability that his or her body will reject the 
transplant within a month is 0.2. What is the probability of surviving both 
of these critical stages? 


Crates of eggs are inspected for blood clots by randomly removing three eggs 
in succession and examining their contents. If all three eggs are good, the 
crate is shipped; otherwise it is rejected. What is the probability that a crate 
containing 120 eggs of which 10 have blood clots will be shipped? 


Suppose that in Vancouver, В.С., the probability that а rainy fall day is 
followed by a rainy day is 0.80 and the probability that a sunny fall day is 
followed by a rainy day is 0.60. Find the probabilities that a rainy fall day 
is followed by 

(a) arainy day, a sunny day, and another rainy day; 

(b) two sunny days and then a rainy day; 

(c) two rainy days and then two sunny days; 

(d) rain two days later. 

(Hint: In part (c) use the formula of Exercise 3.] 


Use the formula of Exercise 3 to find the probability of randomly choosing 
(without replacement) four healthy guinea pigs from a cage containing 20 
guinea pigs of which 15 are healthy and 5 are diseased. 


. A balanced die is tossed twice. If A is the event that an even number comes 


up on the first toss, B is the event that an even number comes up on the 
second toss, and C is the event that both tosses result in the same number, 
are the events A, B, and С independent? 


. If three persons, selected at random, are stopped on a street, what are the 


probabilities that 


(a) all were born on a Friday; 
(b) two were born опа Friday and the other on а Tuesday; 


(c) none was born on à Monday? 
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24. A marksman hits a target with probability å. Assuming independence for 


successive firings, find the probabilities of getting 


(a), one hit followed by two misses; 
(b) two hits and one miss (in any order). - 


25. A certain coin is loaded so that heads is four times as likely as tails. If this 


coin is tossed three times, find the probabilities of getting 
(a) all heads; 
(b) two tails and a head. 


26. An urn contains four black balls and three white balls. If four balls are drawn 


in succession, each ball being replaced in the urn before the next one is 
drawn, what are the probabilities that 

(a) three of the four balls are black and the other is white; 

(b) the first and last balls drawn are both white? 


27. Medical records show that one out of ten individuals in a certain town has 


a low thyroid condition. If 20 persons in this town are randomly chosen and 
tested, what is the probability that at least one of them will have a low thyroid 
condition? 


28. With reference to Figure 2.12 verify that events A, B, C, and D are indepen- 


dent. Note that the region which represents event A consists of two circles, 
and so do the regions representing events B and C. 


Figure 2.12 Diagram for Exercise 28. 


| 


ү 
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. At an electronics plant, it is known from past experience that the probability 


is 0.84 that a new worker who has attended the company's training program 
will meet the production quota, and that the corresponding probability is 
0.49 for a new worker who has not attended the company's training program. 
If 70 percent of all new workers attend the training program, what is the 
probability that a new worker will meet the production quota? 


. In a T-maze, a rat is given food if it turns left and an electric shock if it turns 


right. On the first trial there is a fifty-fifty chance that a rat will turn either 
way; then, if it receives food on the first trial the probability is 0.68 that it 
will turn left on the next trial, and if it receives a shock on the first trial the 
probability is 0.84 that it will turn left on the next trial. What is the probability 
that a rat will turn left on the second trial? 


A mail-order house employs three stock clerks, U, V, and W, who pull items 
from shelves and assemble them for subsequent verification and packaging. 
U makes a mistake in an order (gets a wrong item or the wrong quantity) 
one time in a hundred, V makes a mistake in amorder five times in a hundred, 
and W makes a mistake in an order three times in a hundred. If U, V, and 
W fill, respectively, 30, 40, and 30 percent of all orders, what is the probability 
of a mistake in an order? 


. In a certain community, 8 percent of all adults over 50 have diabetes. If a 


health service in this community correctly diagnoses 95 percent of all persons 

with diabetes as having the disease and incorrectly diagnoses 2 percent of 

all persons without diabetes as having the disease, find the probabilities that 

(a) the community health service will diagnose an adult over 50 as having 
diabetes; 

(b) a person over 50 diagnosed by the health service as having diabetes 
actually has the disease. 


. With reference to Exercise 29, what is the probability that a new worker who 


meets the production quota attended the company's training program? 
With reference to Exercise 31, if a mistake is found in an order, what is the 
probability that it was filled by clerk V? 

With reference to Example 2.25 on page 62, if we discover later that the job 
was completed on time, what is the probability that, nevertheless, there had 
been a strike? 


„ An explosion at a construction site could have occurred as the result of static 


electricity, malfunctioning of equipment, carelessness, or sabotage. Interviews 
with construction engineers analyzing the risks involved led to the estimates 
that such an explosion would occur with probability 0.25 as a result of static 
electricity, 0.20 as a result of malfunctioning of equipment, 0.40 as a result 
of carelessness, and 0.75 as a result of sabotage. It is also felt that the prior 
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probabilities of the four causes of the explosion are, respectively, 0.20; 0.40, 
0.25, and 0.15. Based on all this information, what is 

(a) the most likely cause of the explosion; 

(b) the least likely cause of the explosion? 


37. The manager of a restaurant knows that the odds are 2 to 1 that a customer 
will not have a cocktail before dinner. If he has a cocktail, the odds are 3 to 
2 that he will not order steak, and if he does not have a cocktail, the odds 
are 3 to 1 that he will not order steak. If a customer has a cocktail and a 
steak, the odds are 9 to 1 that he will not have dessert; if he has a cocktail 
but does not order steak, the odds are 7 to 1 that he will not have dessert; 
if he does not have a cocktail but orders steak, the odds are 3 to 1 that he 
will not have dessert; and if he does not have a cocktail and does not order 
steak, the odds are 2 to 1 that he will not have dessert. 

(a) What is the probability that any one customer of this restaurant will 
order dessert? 

(b) Whatisthe probability that a customer who orders dessert had a cocktail 
before dinner? 

(c) What is the probability that a customer who orders steak and dessert 
also had a cocktail before dinner? 


38. An art dealer receives a shipment of five old paintings from abroad, and, on 
the basis of past experience, she feels that the probabilities are, respectively, 
0.76, 0.09, 0.02, 0.01, 0.02, and 0.10 that 0, 1, 2, 3, 4, or all 5 of them are 
forgeries. Since the cost of authentication is fairly high, she decides to select 
one of the five paintings at random and send it away for authentication. If 
it turns out that this painting is a forgery, what probability should she now 
assign to the possibility that all the other paintings are also forgeries? 
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3.1 


Probability Distributions 
and Probability Densities 


INTRODUCTION 


In most applications of probability theory we are interested only in a particular 
aspect (or in two or a few particular aspects) of the outcomes of experiments. 
For instance, when we roll a pair of dice we are usually interested only in the 
total, and not in the outcome for each die; when we interview a randomly chosen 
married couple we may be interested in the size of their family and in their joint 
income, but not in the number of years they have been married or their total 
assets; and when we sample mass-produced light bulbs we may be interested in 
their durability or their brightness, but not in their price. 

In each of these examples we are interested in numbers which are associated 
with the outcomes of chance experiments, namely, in the values which are taken 
on by so-called random variables. In the language of probability and statistics, 
the total we roll with a pair of dice is a random variable, the size of the family 
of a randomly chosen married couple and their joint income are random variables, 
and so are the durability and the brightness of a light bulb randomly picked for 
inspection. 

To be more explicit, consider Figure 3.1, which (like Figure 2.1 on page 
30) pictures the sample space for an experiment in which we roll a pair of dice, 
and let us assume that each of the 36 possible outcomes has the probability 36: 
Note, however, that in Figure 3.1 we have attached a number to each point: for 
instance, we attached the number 2 to the point (1, 1), the number 6 to the point 
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Red die 
Figure 3.1 The total number of points rolled with а pair of dice. 


(1, 5), the number 8 to the point (6, 2), the number 11 to the point (5, 6), and so 
forth. Evidently, we associated with each point the value of a random variable, 
namely, the corresponding total rolled with the pair of dice. 

Since “associating a number with each point (element) of a sample space" 
is merely another way of saying that we are "defining a function over the points 
of a sample space," let us now make the following definition: 


DEFINITION 31 If S is a sample space with a probability measure and х is 
a real-valued function defined over the elements of S, then x is called а 


random variable.’ 


In this book we shall always write random variables in boldface type and their 
values in the corresponding lightface type; for instance, we shall write x to denote 
a value of the random variable x. This practice is not universally accepted (many 
authors use capital letters to denote random variables), but it is consistent with 
the notation of modern mathematics. When actually writing on note paper or on 
a blackboard, it is convenient to indicate random variables by underlining the 


* Instead of “random variable,” the terms “chance variable,” “stochastic variable,” 
and “variate” are also used in some books. 
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respective symbols, perhaps with wavy lines, the usual typesetting notation for 
boldface type. 

With reference to the example above, observe that the random variable x 
takes on the value 9, and we write x — 9, for the subset 


{(6, 3),-(5, 4), (4, 5), (3, 6)} 


of the sample space S. Thus, x = 9 is to be interpreted as the set of elements of 
S for which the total is 9, and more generally, x = x is to be interpreted as the 
set of elements of the sample space for which the random variables x takes on 
the value x. 


EXAMPLE 3.1 


Two socks are selected at random and removed in succession from a drawer 
containing five brown socks and three green socks. List the elements of the sample 
space, the corresponding probabilities, and the corresponding values w of the 
random variable w, where w is the number of brown socks selected. 


Solution 
If B and G stand for brown and green, the probabilities for BB, BG, GB, 


А А О Е Жү Т М, Р s. 19 cy eI eet 
and GG are, respectively, $ * 7 = 13, 8*2 = 38, 8 7 = 36, and в ^ 7 = 265 


and the results are shown in the following table: 


Element of 
sample space Probability w 
BB й 2 
BG 5 1 
GB EL: 1 
GG á 0 


Also, we can write P(w = 2) = уу, for example, for the probability of the 
event that the random variable w will take on the value 2. A 


EXAMPLE 3.2 


A balanced coin is tossed four times. List the elements of the sample space which 
are presumably all equally likely, and the corresponding values x of the random 
variable x, the total number of heads. 
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Solution 


If H and T stand for heads and tails, the results are as shown in the following 
table: 


Element of 
sample space Probability 


м 


HHHH 
HHHT 
HHTH 
HTHH 
THHH 
HHTT 
HTHT 
HTTH 
THHT 
THTH 
TTHH 
HTTT 


Se SE Sk Se si- si- si- si- OF si- ot si- si s sb se 
ө = з э t WiIit'uuwstostuss % 


Thus, we can write Р(х = 3) = $, for example, for the probability of the 
event that the random variable x will take on the value 3. A 


The fact that the definition on page 75 is limited to real-valued functions 
does not impose any undue restrictions. If the numbers we want to assign to the 
outcomes of an experiment are complex numbers, we can always look upon the 
real and the imaginary parts separately as values taken on by two random 
variables. Also, if we want to describe the outcomes of an experiment quantita- 
tively, say, by giving the color of a person’s hair, we can arbitrarily make the 
descriptions real-valued by coding the various colors; perhaps, by representing 
them with the numbers 1, 2, 3, etc. 

In all of the examples of this section we have limited our discussion to 
discrete sample spaces, and hence to discrete random variables, namely, random 
variables whose range is finite or countably infinite. Continuous random variables 
defined over continuous sample spaces will be taken up in Section 3.3. 
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32 PROBABILITY DISTRIBUTIONS 


As we already saw in Examples 3.1 and 3.2, the probability measure defined over 
a discrete sample space automatically provides the probabilities that a random 
variable will take on any given value within its range. 

For instance, having assigned the probability 3; to each element of the 
sample space of Figure 3.1, we immediately find that the random variable x, the 
total rolled with the pair of dice, takes on the value 9 with probability зв, since 
x — 9 (described on page 76) contains 4 of the 36 equally likely sample points. 
The probabilities associated with all possible values of x are shown in the following 
table: 


x Р(х = x) 
2 % 
3 % 
4 % 
5 % 
6 & 
7 % 
8 % 
9 % 
10 % 
11 E 
12 * 


Instead of displaying the probabilities associated with the values of a random 
variable in a table, as we did in the preceding illustration, it is usually preferable 
to give a formula, that is, to express the probabilities by means of a function 
such that its values, f(x), equal P(x — x) for each x within the range of the 
random variable x. For instance, for the total rolled with a pair of dice we can write 


ed Poa Eh | 
I(x) = 56 {ог =.2, 3,..., 12 


as can easily be verified by substitution, Clearly, 


ab 7| 6-5 


: 


f= 
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and all these values agree with the ones shown in the table above. 


79 


function, or probability distribution, of x. 


Based on the postulates of probability, it immediately follows that 


m 


tions 


1. f(x) > 0 for each value within its domain; 


its domain. 


ies 


EXAMPLE 3.3 


DEFINITION 3.2 If x is a discrete random variable, the function given by 
f(x) = Р(х = x) for each x within the range of x is called the probability 


THEOREM 31 A function can serve as the probability distribution of a 
discrete random variable x if and only if its values, f(x), satisfy the condi- 


2. Xf(x) = 1, where the summation extends over all the values within 
x 


Find a formula for the probability distribution of the total number of heads 


obtained in four tosses of a balanced coin. 


| Solution 


Based on the probabilities in the table on page 77, we find that P(x — 0) = 
i. P(x =1)=%, Р(х = 2) =, P(x = 3) = 16, and P(x = 4) = 16. 
| Observing that the numerators of these five fractions, 1, 4, 6, 4, and 1, are 


4\ A (4) (4 4 
the binomial coefficients (9). (2). () (3). and бу we find that we 


can write the formula for the probability distribution as 


{i= for x = 0, 1, 2,3,4 
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A theoretical justification for this formula, and a more general treatment 
for n tosses of a balanced coin, will be given in Section 5.4. A 


EXAMPLE 3.4 


Check whether the function given by 


f(x) = 


+ 
х T for x = 1,2,3,4,5 


2 


can serve as the probability function of a random variable. 


Solution 


~Substituting the different values of x, we get f(1) = 3s, f(2) = 4, (3) 235 


f(4) = $, and /(5) = &. Since these values are all non-negative, the first 
condition of Theorem 3.1 is satisfied, and since 


Li 
t 
4 
У 
$ 

9 
+ 
ue 
+ 


f) + fQ) + f) + f(4) + SS) 


the second condition of Theorem 3.1 is satisfied. Thus, the given function 
can serve as the probability distribution of a random variable having the 
range (1, 2, 3, 4, 5}. Of course, whether any given random variable actually 
has this probability distribution is an entirely different matter. A 


In some problems it is desirable to present probability distributions graphi- 
cally, and two kinds of graphical presentations used for this purpose are shown 
in Figures 3.2 and 3.3. The one shown in Figure 3.2, called a probability histogram, 
represents the probability distribution of Example 3.3. The height of each rectangle 
equals the probability that x takes on the value which corresponds to the midpoint 
of its base. By representing 0 with the interval from —0.5 to 0.5, 1 with the interval 
from 0.5 to 1.5,..., and 4 with the interval from 3.5 to 4.5, we are so to speak 
"spreading" the values of the given discrete random variable over a continuous 
scale. 

Since each rectangle of the histogram of Figure 3.2 has unit width, we could 
have said that the areas of the rectangles, rather than their heights, equal the 
corresponding probabilities. There are certain advantages to identifying the areas 
of the rectangles with the probabilities; for instance, when we wish to approximate 
the graph of a discrete probability distribution with a continuous curve. Thi 
be done even when the rectangles of a histogram do not all have unit wi 
adjusting the heights of the rectangles or by modifying the vertic^' cal’ 
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Number of heads 


Figure 3.2 Probability histogram. 


A 
16 
1 
16 
x 
0 1 2 3 4 
Number of heads 


Figure 3.3 Bar chart. 
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The graph of Figure 3.3 is called а bar chart. As іп Figure 3.2, the height 
of each rectangle, or bar, equals the probability of the corresponding value of 
the random variable, but there is no pretense of having a continuous horizontal 
scale. Although there are several occasions where we shall use such charts in this 
text, histograms and bar charts are used mainly in descriptive statistics to convey 
visually the information provided by a probability distribution or a distribution 
of actual data. 

There are many problems in which it is of interest to know the probability 
that the value of a random variable is less than or equal to some real number x. 
Thus, let us write the probability that x takes on a value less than or equal to x 
as F(x) = P(x < x), and refer to this function defined for all real numbers x 
аз the distribution function, or cumulative distribution, of the random variable x. 


DEFINITION 3.3 If x is a discrete random variable, the function given by | 


1x 


F(x) = P(x < x) = У f(t) for-o<x<0 | 


where f(t) is the value of the probability distribution of x at t, is called the | 
distribution function, or cumulative distribution, of x. | 


Based on the postulates of probability and some of its immediate consequences, | 
it follows that | 


THEOREM 32 The values, F(x), of the distribution function of a discrete 
random variable x satisfy the conditions 


| 
1. F(-o)- 0; 
2. Е(оо) = 1; 
3. ifa < Б, then F(a) = F(b) for any real numbers a and b. 


If we are given a discrete probability distribution, the corresponding distri- 
bution function is generally easy to find. 


Find the distribution function of the total number of heads obtained in four 
tosses of a balanced coin. 
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Solution 


Given f(0) = 3s, f0) = 16, f(2) = &, f(3) = 16, and f(4) = 15 from 
Example 3.3, it follows that 


F(0) = f(0) = 16 

F(1) = f(0) + f0) = % 

FQ) = f(0) + (1) + f(2) = % 

F(3) = f(0) + f0) + fQ) + /(3) = i$ 

F(4) = f(0)  f() + £2) + FB) + f(9 = 1 


Hence, the distribution function is given by 


forx <0 
for0<x<1 
forl<x <2 
{ог2= х <3 
{ог3 = х < 4 
forx 2 4 


F(x) = 


= sla 5Е ale 5- © 


Observe that this distribution function is defined not only for the 
values taken on by the given random variable, but for all real numbers. For 
instance, we can write F(1.7) = and F(100) = 1, although the prob- 
abilities of getting “at most 1.7 heads” or “at most 100 heads” in four tosses 
of a balanced coin may not be of any real significance. A 


EXAMPLE 3.6 


Find the distribution function of the random variable w of Example 3.1 and plot 
its graph. 


Solution 


Based on the probabilities given in the table on page 76, we can write 
fO) = з, f(1) = K +K = dg, and f(2) = 34, 50 that 


F(0) = f(0) = 38 
F(1) = f(0) + f0) = 5 
F(2) = f(0) + f(1) + f2) = 1 
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Hence, the distribution function of w is given by 


for w <0 
fo0swcl 
forl<w<2 
for w = 2 


F(w) = 


= zege © 


The graph of this distribution function, shown in Figure 3.4, was 
obtained by first plotting the points (w, F(w)) for w = 0, 1, and 2. Then, 
the step function is completed as indicated, and it should be observed that 
at all points of discontinuity it takes on the greater of the two values. A 


Fiw) 


0 1 2 


Figure 34 Graph of the distribution function of Example 3.6. 


We can also reverse the process illustrated in the two preceding examples, 
namely, obtain values of the probability distribution of a random variable from 
its distribution function. To this end, we use the following result: 


THEOREM 33 If the range of a random variable x consists of the values 
Xi < Xy < xy € +++ < x, then f(x,) = F(x,) and 


Хх) = F(x)- Р(х) fori = 2,3,...,7 
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EXAMPLE 3.7 


Given the distribution function 


0 forx «2 
X fo2sx«3 
x for3<x<4 
5 for4<x<5 
10 ^ forS5 <= x <6 
F(x) = for6<x <7 
36 {0717 =х<8 
20 fo8sx-«9 
30 ([0г9 <x < 10 


T 
25 


for10 = x < 11 
for11 = x < 12 
forx > 12 


E 
ala A 


-— 


find the values of the probability distribution of this random variable. 


Solution 
Making use of Theorem 3.3, we get /(2) = 4/3) = - з= 5,704) = 
£-2=2,f6) =R -% =%,---,f(12) = 1- 3 = 4, and comparison 
with the probabilities in the table on page78 reveals that the random variable 
we are concerned with here is the total number of points rolled with a pair 
of dice. a 


In the remainder of this chapter we will be concerned with continuous 
random variables and their distributions, and with problems relating to the 
simultaneous occurrence of the values of two or more random variables. In 
Chapter 5 we shall return to discrete probability distributions; in fact, all of that 
chapter will be devoted to discrete probability distributions which provide 
especially important models for applications. 


THEORETICAL EXERCISES 
1. Verify that f(x) = reo for x = 1,2,3,...,k can serve as the prob- 


ability distribution of a random variable. 
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2. For each of the following, determine c so that the function can serve as the 
probability distribution of a random variable: 


(a) f(x) = сх for x = 1,2,3,4,5; 

(b) f(x) = ‹(5) forx = 0,1, 2,3,4, 5; 
(c) f(x) = cx? forx = 
(d) f(x) = c2* forx = 


(e) f(x) = eG) forx- 
[Hint: For part (c) refer to Abpea її at the end of the book.] 


LU 


3. For what values of k can 
/(х) = (1 — k)k* forx = 0,1,2,... 


serve as the probability distribution of a random variable? 


4. Show that there are no values of c such that the following can serve as 
probability distributions: 


(a) f(x) =< for x= 1,2,3,...; 


(b) f(x) = с2* for =e 2 Sie as 


5. Construct a probability histogram for each of the following discrete probabil- 
ity distributions: 


2 4 
(а) fix) = 62.) 62.) 
(Ы) f(x) = () 9) for x = 0,1,2,3,4, 5. 


Prove Theorem 3.2. 


for x = 0,1,2; 


м Фф 


Find the distribution function which corresponds to the probability distribu- 
tion of part (a) of Exercise 5 and plot its graph. 


8. Find the distribution function which corresponds to the probability distri- 
bution 


x 
f(x) = is for x = 1,2,3,4,5. 


| 
| 
| 
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9. Given that the discrete random variable x has the distribution function 


0 
T 
3 
F(x) = 13 

5 
6 
1 

find 

(a) P(2«x*6); 

(b) P(x = 4); 


forx <1 
forl<x<4 
for4<x <6 
for6 € x < 10 
for x > 10 


(c) the probability distribution of x. 


10. Given that x has the distribution function 


0 forx < -1 
X for-l1<x<1 
F(x) =43 forl =x <3 

3 for3 <x<5 

1 for x > 5 
find 
(а) P(x <3); (b) P(x = 3); (с) P(x < 3); 
(d) P(x >= 1); (е) P(-0.4<x< 4); (f P(x = 5). 


11. With reference to Example 3.4, verify that for 


of the distribution function are given by 


F(x) 


2X +5x 
50 


12. With reference to Theorem 3.3, verify that 
2]- F(x) for i = 1,2, РУ Ay 
(b P(x 2 x)-21-F(x-) for i = 2,3,...,n, and P(x > x,)=1. 


(а) P(x > x) = 1 


APPLIED EXERCISES 


13. With reference to 
difference between 
four tosses of a ba 
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x = 1,2,3,4,and 5, the values 


Example 3.3, find the probability distribution of y, the 
the number of heads and the number of tails obtained in 


lanced coin. 
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14. An urn contains four balls numbered 1, 2, 3, and 4, respectively. If two balls 
are drawn from the urn without replacement and x is the sum of the numbers 
on the two balls drawn, find 
(a) the probability distribution of x and represent it by means of a histogram; 
(b) the distribution function of x and draw its graph. 


15. A tape recorder contains six transistors, of which two are defective. If two 
of these transistors are selected at random, removed from the tape recorder, 
and inspected, and if x is the number of defectives observed, find 
(a) the probability distribution of x; 

(b) the distribution function of x. 
Also draw a histogram of the probability distribution and a graph of the 
distribution function. 


, 


16. A coin is biased so that heads is three times as likely as tails. For three 
independent tosses of the coin, find 
(a) the probability distribution of x, the total number of heads; 
(b) the probability of getting at most two heads. 


17. With reference to Exercise 16, find the distribution function of the random 
variable x and plot its graph. Using this distribution function, find 
(a) Р(1 <х <3); 
(b) P(x > 2). 


18. The probability distribution of x, the weekly number of accidents at a certain 
intersection, is given by f(0) = 0.40, f(1) = 0.30, f(2) = 0.20, and f(3) = 
0.10. Construct the distribution function of this random variable and draw 
its graph. 

19. The probabilities that a person shopping at a certain department store will 
not make a purchase, make at most one purchase, at most two purchases, at 
most three purchases, or at most four purchases, are, respectively, 0.22, 0.54, 
0.87, 0.91, and 1.00. Find the probabilities that a person shopping at this 
department store will make 


(a) two purchases; (b) more than two purchases; 
(c) three purchases; (d) at least one purchase. 


33 CONTINUOUS RANDOM VARIABLES 


In Section 3.1 we introduced the concept of a random variable as a real-valued 
function defined over the points of a sample space with a probability measure, 
and in Figure 3.1 we illustrated this by assigning the total rolled with a pair of 
dice to each of the 36 equally likely points of the sample space. In the continuous 
case, where random variables can take on values on a continuous scale, the 
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procedure is very much the same. The outcomes of experiments are represented 
by the points on line segments or lines, and the values of random variables are 
numbers appropriately assigned to the points by means of rules or equations. 
When the value of a random variable is given directly by a measurement or 
observation, we generally do not bother to distinguish between the value of the 
random variable (the measurement which we obtain) and the outcome of the 
experiment (the corresponding point on the real axis). Thus, if an experiment 
consists of determining the actual content of a 230-gram jar of instant coffee, the 
result itself, say, 225.3 grams, is the value of the random variable with which we 
are concerned, and there is no real need to add that the sample space consists 
of a certain continuous interval of points on the positive real axis. 

The problem of defining probabilities in connection with continuous sample 
spaces and continuous random variables involves some complications. То illus- 
trate, let us consider the following situation: 


EXAMPLE 3.8 


| Suppose that we are concerned with the possibility that an accident will occur 
| on a freeway which is 200 kilometers long, and that we are interested in the 
probability that it will occur at a given location, or perhaps on a given stretch 
of the road. The sample space of this “experiment” consists of a continuum of 
points, those on the interval from 0 to 200, and we shall assume, for the sake of 
argument, that the probability that an accident will occur on any interval of 


length D is „О with D measured in kilometers. Note that this assignment of 


200 
probabilities is consistent with Postulates 1 and 2 on page 36, since the prob- 
D x 200 { ; 
abilities m all non-negative and P(S) = 200 ^ 1. So far, this assignment 


of probabilities applies only to intervals on the line segment from 0 to 200, but 
if we use Postulate 3, we can also obtain probabilities for the union of any finite 
or countably infinite sequence of non-overlapping intervals. For instance, the 
probability that an accident will occur on either of two non-overlapping intervals 


of length О, and D; is 
D, + D; 
200 


and the probability that it will occur on one ofa countably infinite sequence of 
non-overlapping intervals of length D;, D2, Ds,..., is 


Dit Dit DT: 
200 
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Then, if we apply Theorem 2.7, we can extend the probability assignment to the 
union of intervals that overlap, and since the intersection of two intervals is an 
interval and the complement of an interval is either an interval or the union of 
two intervals, we can extend the probability assignment to any subset of the 
sample space which can be obtained by forming unions or intersections of finitely 
many or countably many intervals, or by forming complements. A 


Thus, in extending the concept of probability to the continuous case, we 
have again used Postulates 1, 2, and 3, but to do this in general we must exclude 
from our definition of "event" all subsets of the sample space which cannot be 
obtained by forming unions or intersections of finitely many or countably many 
intervals, or by forming complements. Practically speaking, this is of no con- 
sequence, for we simply do not assign probabilities to such abstruse kinds of sets. 

With reference to Example 3.8, observe also that the probability of the 
accident occurring on a very short interval, say, an interval of 1 centimeter, is 
only 0.00000005, which is very small. As the length of the interval approaches 
zero, the probability that an accident will occur on it also approaches zero; 
indeed, in the continuous case we always assign zero probability to individual 
points. This does not mean that the corresponding events cannot occur—aftér 
all, when an accident occurs on the 200-kilometer stretch of road, it has to occur 
at some point even though each point has zero probability. 


34 PROBABILITY DENSITY FUNCTIONS 


The way in which we assigned probabilities in Example 3.8 is very special, and 
it is similar in nature to the way in which we assign equal probabilities to the 
six faces of a die, heads and tails, the 52 playing cards in a standard deck, and 
so forth. To treat the problem of associating probabilities with continuous random 
variables more generally, suppose that a bottler of soft drinks is concerned about 
the actual amount of a soft drink which his bottling machine puts into 16-ounce 
bottles. Evidently, the amount will vary somewhat from bottle to bottle and hence 
is a continuous random variable. However, if he rounds the amounts to the 
nearest tenth of an ounce, he will be dealing with a discrete random variable 
which has a probability distribution, and this probability distribution may be 
pictured as a histogram in which the probabilities are given by the areas of 
rectangles, say, as in the diagram at the top of Figure 3.5. If he rounds the 
amounts to the nearest hundredth of an ounce, he will again be dealing with à 
discrete random variable (a different one) which has a probability distribution, 
and this probability distribution may be pictured as a histogram in which the 
probabilities are given by the areas of rectangles, say, as in the diagram in the 
middle of Figure 3.5. 
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159 


Amounts rounded to nearest tenth of an ounce 


15.90 16.00 16.10 
Amounts rounded to nearest hundredth of an ounce 


15.9 16.0 


Figure 3.5 Definition of probability in the continuous case. 


It should be apparent that if he rounded the amounts to the nearest 
thousandth of an ounce, or to the nearest ten-thousandth of an ounce, the 
histograms of the probability distributions of the corresponding discrete random 
variables will approach the continuous curve shown in the diagram at the bottom 
of Figure 3.5, and the sum of the areas of the rectangles which represent the 
probability that the amount falls within any specified interval approaches the 
corresponding area under the curve. 

Indeed, the definition of probability in the continuous case presumes for 
each random variable the existence of a function, called a probability density 
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function, so that areas under the curve give the probabilities associated with the 
corresponding intervals along the horizontal axis. In other words, a probability 
density function, integrated from a to b (with a < b), gives the probability that 
the corresponding random variable will take on a value on the interval from a 
to b. 


mc 


DEFINITION 3.4 A function with values f(x), defined over the set of all real 
numbers, is called a probability density function of the continuous random 
variable x if and only if 


b 
Р(а=х = Б) = || f(x) dx 


for any real constants a and b with a = b. 


Probability density functions are also referred to, more briefly, as probability 
densities, density functions, densities, and p.d.f.'s. 

Note that f(c), the value of the probability density function of x at c does 
not give P(x — c) as in the discrete case. In connection with continuous random 
variables, probabilities are always given by integrals evaluated over intervals, 
whereas P(x — c) — 0 for any real constant c. This agrees with our discussion 
on page 90 and it also follows directly from Definition 3.4 with a — b — c. 

- Because of this property, the value of a probability density function can be 
changed for some of the values of a random variable without changing any of 
the probabilities, and this is why we said in Definition 3.4 that f(x) is the value 
of a probability density, not the probability density, of the random variable x at 
X. Also because of this property, it does not matter whether we include the 
endpoints of the interval from a to b; symbolically, 


THEOREM 34 If x is a continuous random variable and a and b are two 
real constants with a « b, then 


Р(а < x < b) = P(a < x < b) = P(a < x < b) = P(a < x < b) 


Analogous to Theorem 3.1, let us now state the following properties of 
probability density functions, which again follow directly from the postulates of 
probability: 
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THEOREM 35 A function can serve as a probability density function ofa 
continuous random variable x if its values, f(x), satisfy the conditions’ 


1. f(x) 2 0 for -oo < x < o6; 


2. lk f(x) dx = 1. 


EXAMPLE 3.9 
The probability density function of the random variable x is given by 


егор 220 
0 elsewhere 


Дх) = | 
Find k and P(0.5 « x « 1). 


Solution 


To satisfy the second condition of Theorem 3:5, we must have 


— Sx 


ii f(x) dx = [e e™ dx = К: lim 


0 fc — 


and it follows that k — 3. For the probability we get 


1 


Р(0.5 < x < 1) = | 3e?* dx = –е7** 


Although the random variable of Example 3.9 cannot take on negative 
values, we artificially extended the domain of its probability density to include 
all the real numbers. This is a practice we shall follow throughout this text. 

As in the discrete case, there are many problems in which it is of interest 
to know the probability that the value of a continuous random variable is less 


t The conditions are not "if and only if” as in Theorem 3.1 because f(x) could be 
negative for some values of the random variable without affecting any of the probabilities. 
However, both conditions of Theorem 3.5 will be satisfied by all the probability density 
functions we shall study in this text. 
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than or equal to some real number x. Thus, let us make the following definition 
analogous to Definition 3.3: 


DEFINITION 35 If x is a continuous random variable, the function given by 


F(x) = рка) = f f(t)dt ог-о<х< o 


where f(t) is the value of the probability density function of x at t, is called 
the distribution function, or cumulative distribution, of x. 


The properties of distribution functions given in Theorem 3.2 hold also for 
the continuous case; that is, F(—00) = 0, Е(оо) = 1, and F(a) = F(b) when 
а « b. Furthermore, it immediately follows from Definition 3.5 that 


ЖЕК ЫИ 36 If f(x) and F(x) are, respectively, values of the probability 
distribution and the distribution function of x at x, then 


P(a =x < b) = F(b) — F(a) 
for any real constants a and b with a < b, and 


fix) = 0) 


а 


where the derivative exists. 


EXAMPLE 3.10 


Find the distribution function which corresponds to the probability density 
function of Example 3.9. Ао, use this distribution function to reevaluate 
P(0.5 x x < 1). 


Solution 


For x > 0, 


Е(х) = | f(t) dt = [| 3e а = ег. =1-0e%* 
0 


E 
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and we can write 


0 forx = 0 
1-e?* forx>0 


Е(х) = { 


To find the probability P(0.5 < x « 1) we make use of the formula of the 
first part of Theorem 3.6, getting 
P(0.5 < x < 1) = F(1) - F(0.5) 
Elie) —(@— е!) 
= 0,173 


This agrees with the result obtained by using the probability density function 
in Example 3.9. A 


EXAMPLE 3.11 


Find a probability density function for the random variable whose distribution 
function is given by 


0 for x = 0 
F(x) = 4x forO0 <x <1 
1 forx 2 1 


Solution 


The graph of this distribution function is shown in Figure 3.6, and it can 
be seen that it is continuous and differentiable everywhere except at x = 0 


Р(х) 


1 


0 1 
Figure 3.6 Graph of the distribution function of Example 3.11. 
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and x = 1. Differentiating the distribution function for x < 0,0 < x < 1 
and x > 1, we get 


* 


0 forx <0 
f(x) = 41 for0<x <1 
0 forx > 1 


and to fill the two gaps we let f(0) and f(1) both equal zero. Actually, it 
does not matter how the probability density function is defined at these 
two points, but there are certain advantages (which will be explained on 
page 247) for choosing the values in such a way that the probability density 
function is non-zero over an open interval. Thus, we can write the probability — 
density function as : 


1 for0<x<1 


I(x) = " elsewhere 3 
and the graph of this function is shown in Figure 3.7. A 


fix) 


Figure 3.7 Graph of the probability density function of Example 3.11. 


In most practical applications we encounter random variables that are either 
discrete or continuous, so that the corresponding distribution functions have 
either a step-like appearance as in Figure 3.4 or they are continuous curves as 
in Figure 3.6. Discontinuous distribution functions like that of Figure 3.8 arise 
when random variables are mixed. The distribution function of such a random 
variable will be discontinuous at each point having a non-zero probability and 
continuous elsewhere. As in the discrete case, the height of the step at a point 
of discontinuity gives the corresponding probability that the random variable 
will take on that value. With reference to Figure 3.8, P(x = 0.5) =3-/ 
but otherwise the random variable is like a continuous random yariable. 
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0 0.5 1 


Figure 3.8 Graph of the distribution function of a mixed random variable. 


shall limit ourselves in this text to random variables which are either discrete or 
continuous, with the latter having distribution functions which are differentiable 
for all but a finite set of values of the random variables. 


THEORETICAL EXERCISES 


1. 


The probability density function of the continuous random variable x is given 
by 


i for2<x<7 


5 
0 elsewhere 


f(x) = { 


(a) Show that the area under the curve (above the x-axis) is equal to 1. 
(b) Find P(3 < x < 5). 


. If the probability density of the continuous random variable y is given by 


Л Me fore y < 4 
AET 0 elsewhere 


find 
(a) Ply < 3.2); 
(b) P(29 <у < 32). 


. If the p.d.f. of the random variable x is given by 


fotüc x «4 > 


€ 
Р f(x) = 4 Ух 


& 0 elsewhere 
X" 


98 Chap. 3: Probability Distributions and Probability Densities 


find 


(a) the value of c; 
(b) the distribution function of this random variable; 
(c) P(x > 1). 


4. If the probability density function of the random variable z is given by 


Кет" forz > 0 
0 Ѓог2 = 0 


f(z) = { 


find 

(a) the value of k; 

(b) the distribution function of this random variable. 

Also sketch the graphs of the probability density and distribution functions, 


5. If the density function of the random variable x is given by 


com [HE - x) for0<x<1 
8 10 elsewhere 


find 

(а) Р(х > 4); 

(b) the distribution function of this random variable; 

(c) the value of m such that G(m) = 0.5, namely, such that m is the median 
of the distribution of x. 


6. If the probability density function of the random variable w is given by 


cwt+w  for0<w<1 
Ху) = 
0 elsewhere 
find 
(a) the value of c; 
(b) the distribution function of this random variable and plot its graph; 
(с) P(OO<ws}). 


7. Find the distribution function of the random variable x whose probability 
density is given by 


1 for0<x<1 
f(x)-213  fo2«xc-«4 
0 elsewhere 


Sketch the graphs of the probability density and distribution functions. 


8. 


10. 


11. 
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If the density function sof the random variable z is given by 


—kz for-1<z<0 
h(z) = kz for0<z<1 
0 elsewhere 


find. - 


(a) the value of k; 
(b) the distribution function of this random variable and sketch its graph; 


(с) P(-i«z«2. 


. If the density function of the random variable x is given by 


x for0<x<1 
f(x)-12-x {ог1<х<с 
elsewhere 


find 

(a) the value of c; 

(b) the distribution function of x; 

(с) Р(08<х< 0.6с). 

Find the distribution function of the random variable x whose density function 
is given by 


i {от0<х=1 

l forl<x<2 
f(x) 242 

= for2<x < 3 

0 elsewhere 


Also sketch the graphs of the derisity and distribution functions. 
If the distribution function of the random variable x is given by 
0 for x < -1 
+1 
F(x) = A. fo-1€x-«1 


1 forx 2 1 
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find 
(a) the value of c; 


(b) the distribution function of this random variable; 
(c) Р(х > 1). 


4. If the probability density function of the random variable z is given by 


Гей forz > 0 
0 forz < 0 


f(z) = | 


find 
(a) the value of k; 
(b) the distribution function of this random variable. 


Also sketch the graphs of the probability density and distribution functions. 


5. If the density function of the random variable x is given by 


epe [Мы for0<x<1 
RE dim elsewhere 


find 

(a) P(x > 2); 

(b) the distribution function of this random variable; 

(c) the value of m such that G(m) — 0.5, namely, such that m is the median 
of the distribution of x. 


6. If the probability density function of the random variable w is given by 


cw-*w  fo0cwcl 
fiw) = { 
0 elsewhere 


find 

(a) the value of c; 

(b) the distribution function of this random variable and plot its graph; 
(с) P(0s wx). 


7. Find the distribution function of the random variable x whose probability 
density is given by 


3 for0<x<1 
Л) а for? <x <4 
0 elsewhere 


Sketch the graphs of the probability density and distribution functions. 
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8. If the density function cof the random variable z is given by 


«o 


10. 


11. 


—kz fon-1«2«0 
h(z) =4 kz for0<z<1 
0 elsewhere 


find 


(a) the value of k; 
(b) the distribution function of this random variable and sketch its graph; 


(с) P(-}<2 < 2). 


. If the density function of the random variable x is given by 


x fr0<x<1 
f(x) = {2—х frisx<c 
elsewhere 


find 

(a) the value of c; 

(b) the distribution function of x; 
(с) Р(08<х< 0.6с). 


Find the distribution function of the random variable x whose density function 
is given by 

x 

Ж їог0<х=1 

2 о! 

1 

> fri <x <2 

f(x) =4 2 
PET for2<x <3 
2 
0 elsewhere 


Also sketch the graphs of the derisity and distribution functions. 


If the distribution function of the random variable x is given by 
0 forx < -1 
+ 
F(x) = x for-1<x <1 


1 forx 21 
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12. 


13. 


14. 


find 

(а) P(-i«x« 

(D P(2<x < 3); 

(c) the probability density of this random variable and use it to recalculate 
the probability of part (a). 


If the distribution function of the random variable y is given by 


9 
i-s fory > 3 
Б(у) =} » 4 


0 elsewhere 


find 

(а) P(y < 5); 

(b Р(у > 8); 

(c) the probability density of y, letting its value equal zero wherever it is 
undefined. 

Also sketch the graphs of the two functions. 


If the distribution function of the random variable x is given by 


= + TUE 
ко) = | (1+ x)e forx > 0 


forx = 0 


бпа 

(а) Р(х = 2); 

(Б) Р(1 <x < 3); 

(c) P(x > 4); 

(d) the probability density function of x. Are there any points at which it 
is undefined? 


Also sketch the graphs of the distribution and density functions of x. 


If the distribution function of the random variable x is given by 
0 forx <0 
7 for0<x <1 
F(x) = 1 
Hig forl < x< 1.5 
1 for x = 1.5 


find 

(a) P(0.4 < x< 1.3); 

(b) Р(х > 0.5); . 

(c) the probability density of x. 


15. 
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If the distribution function of the random variable z is given by 


0 forz < -2 


z+4 
F(z) =) Seu for-2<z<2 


1 forz > 2 
find 
(a) P(z = –2); (b) P(z = 2); 
(c) Р(= 2 < 1); (d P(0sz* 2). 


APPLIED EXERCISES 


16. 


17. 


18. 


The actual amount of coffee (in grams) in a 230-gram jar filled by a certain 
machine is a random variable whose probability density function is given by 


for x < 227.5 
j for 227.5 < x < 232.5 
0 for x > 232.5 


e 


f(x) = 


Find the probabilities that a 230-gram jar filled by this machine will contain 
(a) at most 228.65 grams of coffee; 

(b) anywhere from 229.34 to 231.66 grams of coffee; 

(c) at least 229.85 grams of coffee. 

The number of minutes that a flight from Phoenix to Tucson is early or late 
is a random variable whose probability density is given by 


Rie Ines -x) for-6<x<6 
0 elsewhere 


ve of the flight's being early and positive 


where negative values are indicati 
values are indicative of its being late. Find the probabilities that one of these 


flights will be 

(a) at least 2 minutes early; 

(b) at least 1 minute late; 

(c) anywhere from 1 to 3 minutes early; 

(d) exactly 5 minutes late. 

The shelf life (in hours) of a certain perishable packaged food is a random 
variable whose probability density function is given by 


20,000 
fix) = {с + 100)° 
0 elsewhere 


forx > 0 
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Find the probabilities that one of these packages will have a shelf life of 
(a) at least 200 hours; 

(b) at most 100 hours; 

(c) anywhere from 80 to 120 hours. 


19. The tread wear (in thousands of kilometers) which car owners get with a 
certain kind of tire is a random variable whose probability density function 
is given by 

ll 
_ [sse * forx > 0 
ло) = [5 forx <0 

Find the probabilities that one of these tires will last 

(a) at most 19,000 kilometers; 

(b) anywhere from 29,000 to 38,000 kilometers; 

(c) at least 48,000 kilometers. 


20. In a certain city the daily consumption of water (in millions of liters) is a 
random variable whose probability density is given by 


xe  forx»0 
0 elsewhere 


f(x) = { 


What are the probabilities that on a given day 

(a) the water consumption in this city is no more than 6 million liters; 

(b) the water supply is inadequate if the daily capacity of this city is 9 
million liters? 


21. The total lifetime (in years) of five-year-old dogs of a certain breed is a 
random variable whose distribution function is given by 


0 forx < 5 


Е(х) = 5 
б) 1-3 forx > 5 


Find the probabilities that such a five-year-old dog will live 
(a) beyond 10 years; 

(b) less than 8 years; 

(c) anywhere from 12 to 15 years. 


3.5 MULTIVARIATE DISTRIBUTIONS 


In Section 3.1 we defined a random variable as a real-valued function defined 
over a sample space with a probability measure, and it stands to reason that 
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many different random variables can be defined over one and the same sample 
space. With reference to the sample space of Figure 3.1, for example, we con- 
sidered only the random variable whose values were the totals rolled with the 
pair of dice, but we could also have considered the random variable whose values 
are the products of the numbers rolled with the two dice, the random variable 
whose values are the differences between the numbers rolled with the red die 
and the green die, the random variable whose values are 0, 1, or 2 depending on 
the number of dice which come up 2, and so forth. Closer to life, an experiment 
may consist of randomly choosing some of the 345 students attending an elemen- 
tary school, and the principal may be interested in their I.Q.'s, the school nurse 
in their weights, their teachers in the number of days they have been absent, and 
so forth. 

In this section we shall be concerned first with the bivariate case, that is, 
with situations where we are interested at the same time in a pair of random 
variables defined over a joint sample space. Later, we shall extend this discussion 
to the multivariate case, covering any finite number of random variables. 

If x and y are two discrete random variables, we write the probability that 
x will take on the value x and y will take on the value y as P(x = ҳу = у); 
thus, Р(х = х,у = y) is the probability of the intersection of the events x = X 
and y = y. As in the univariate case (see page 78), the probabilities associated 
with all possible pairs of values (x, y) can be displayed by means of a table. 


EXAMPLE 3.12 


Two tablets are selected at random from a bottle containing 3 aspirin, 2 sedative, 
and 4 laxative tablets. If x and y are, respectively, the number of aspirin tablets 
and the number of sedative tablets included among the two tablets drawn from 
the bottle, find the probabilities associated with all possible pairs of values (x, y). 


Solution 


The possible pairs are (0, 0), (0, 1), (1,0), (1, 1), (0,2), and (2, 0). To find 
the probability associated with (1,0), for example, observe that we are 
concerned with the event of getting 1 aspirin, none of the sedatives, and 
hence 1 laxative tablet. The number of ways of selecting 1 of the 3 aspirin 


4 
tablets and 1 of the 4 laxative tablets is (i) ( ‘) = 12, and the total number 


kd 
of equally likely ways of selecting 2 of the 9 tablets is ( d = 36. By Theorem 
2.2 it follows that the probability associated with (1, 0) is 12 = 1. Similarly, 


010 


the probability associated with (1, 1) is 2160 and, continuing this 
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way, we obtain the values shown in the following table: 


H 
6 
2 
° 
1 
36 


As in the univariate case, it is usually preferable to represent probabilities 
like the ones above by means of a formula; that is, express the probabilities by 
means of a function with the values f(x,y) = P(x = x, y = y) for any pair of 
values (x, y) within the range of the random variables x and y. For instance, we 
shall see in Chapter 5 that we can write 


Dem) 
ax W ND x-yj forx = 0,1,2; у= 0,1,2; 


S(x,y) = () О=х+у<2 


for the pair of random variables of Example 3.12. 


DEFINITION 3.6 If x and y are discrete random variables, the function given 
by f(x, y) = P(x = x, Y, = y) for each pair of values (x, У) within the range 


of x and y is called the joint probability function, or joint probability 
distribution, of x and y. 


Analogous to Theorem 3.1, it immediately follows from the postulates of probabil- 
ity that 


THEOREM 3.7 A bivariate function can serve as the joint probability distri- 
bution of a pair of discrete random variables x and y if and only if its 
values, f(x, y), satisfy the conditions 


1. f(x, у) > 0 for each pair of values (x, y) within its domain; 
2. УУ у) = 1, where the double summation extends over all 
xy 


possible pairs (x, y) within its domain. 
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EXAMPLE 3.13 
Determine the value of К so that the function given by 
f(x,y) = kxy forx = 1,2,3; y =ж]1,2,3 
can serve as a joint probability distribution. 


Solution 


Substituting the various values of x and y, we get f(1,1) = К, f(,2) = 2k 
f(,3)-23k fQ,1) = 26 fQ,2)-4k /(2,3) = 6k /(3,1) = 36 
f(3,2) = 6k, and f(3, 3) = 9k To satisfy the first condition of Theorem 3.7, 
the constant k must be non-negative, and to satisfy the second condition, 


k+2k+3k+2k+ 4k + 6k + 3k + 6k + 9k = 1 


so that 36k =landk=%. А 


As in the univariate case, there are many problems in which it is of interest 
to know the probability that the values of two random variables are less than or 
equal to some real numbers x and y. 


DEFINITION 37 If x and y are discrete random variables, the function given 
by 


F(x,y) = P(x<x%y<y) = Y Xf(so) for —o < x < оо, 
sx (y -о<у< 


where f(s, t) is the value of the joint probability distribution of x and y at 
(s, t), is called the joint distribution function, or joint cumulative distribution, 


of x and y. 


In Exercise 6 on page 114, the reader will be asked to prove properties of joint 
distribution functions which are analogous to those of Theorem 3.2. 


EXAMPLE 3.14 


With reference to Example 3.12, find F(1, 1). 
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Solution 


F(1,1) = P(x < Lys 1) 
= f(0,0) + f(0, 1) + f(1, 0)  f(1, 1) 
=Etstité 


A 


оке о! 


As in the univariate case, the joint distribution function of two random 
variables is defined for all real numbers; for instance, we also get 
F(-2,1) = P(x < -2, y < 1) = 0 and F(3.7,4.5) = P(x < 37, y < 45) = 1 
for the preceding example. 

Let us now extend the concepts introduced so far in this section to the 
continuous case. 


DEFINITION 38 А bivariate function with values f(x, y), defined over the 


xy-plane, is called a joint probability density function of the continuous 
random variables x and y if and only if 


P[(x, y) e A] = [fran dx dy 


A 
for any region A in the xy-plane. 


Analggous to Theorem 3.5, it immediately follows from the postulates of probabil- 
ity that 


THEOREM 38 A bivariate function can serve as a joint probability density 


function of a pair of continuous random variables x and y if its values, 
F(x, y), satisfy the conditions 


1. Лу) 20 (ог-©<х<осо,-о<у<о; 


2. f | f(x, у) dx dy = 1. 
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EXAMPLE 3.15 
If the joint probability density function of x and y is given by 


3x(y + x) for0<x<1,0<y<2 
0 elsewhere 


Уху) = { 
find P[(x, y) є A], where A is the region {(x, y)|0 < x < Ay B 
Solution 
P{(x,y) є A] = Р(0 <х < },1 <у < 2) 
= [ [x0 + x) dx dy 


- [з= 
| 10 15 |х—о 


2 (Зу L) 3y? y 
= | (206 ја = +2 
[ ү, м/® = so * 40 


2 


Analogous to Definition 3.7, we have the following definition of the joint 


distribution function of two continuous random variables: 


given by 


-0 < y « oo 


is called the joint distribution function of x and y. 


DEFINITION 3.9 If x and y are continuous random variables, the function 


y x 
Fixy) = Pe xy en | f f(s,t)dsdt for -% < x < co, 


where f(s, t) is the value of the joint probability density of x and y at (s, t), 


The properties of joint distribution functions which the reader will be asked to 
prove in Exercise 6 on page 114 for the discrete case hold also for the continuous 


case. 


As in Section 3.4, we shall limit our discussion to random variables whose 
joint distribution function is continuous everywhere and partially differentiable 
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with respect to each variable for all but a finite set of values of each of the two 
random variables. 
dF(x) 


Analogous to the relationship f(x) — of Theorem 3.6, partial 


differentiation in Definition 3.9 leads to 


at 
f(x,y) = tay. F(x, у) 


wherever these partial derivatives exist. As in Section 3.4, the joint distribution 
function of two continuous random variables determines their joint density (short 
for joint probability density function) at all points (x, y) where the joint density 
is continuous. Also as in Section 3.4, we generally let the values of joint probability 
densities equal zero wherever they are not defined by the above relationship. 


EXAMPLE 3.16 


If the joint probability density of x and y is given by 


cry for0<x<1,0<y<1 
0 elsewhere 


f(x, y) = | 
find the corresponding joint distribution function. 
Solution 
If either x < 0 or y < 0, it follows immediately that F(x, у) = 0. For 
0 <x < 1 and 0 < y < 1 (Region I of Figure 3.9) we get 
» fx К 
Е(х,у) = | | (s + t) ds dt = ixy(x + y) 
о J0 
for x > 1 and 0 < y < 1 (Region П of Figure 3.9) we get 
У 1 
F(x, y) = | | (s + t) dsdt = y(y + 1) 
о Jo 


for 0 < x < 1 and y > 1 (Region III of Figure 3.9) we get 


F(x, у) = | [е + t) dsdt = ix(x + 1) 
о Јо 
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Figure 


EXAMPLE 3.17 


39 Diagram for Example 3.16. 


and for x > 1 and y > 1 (Region IV of Figure 3.9) we get 


1 1 
F(x, y) «| | (s + t) dsdt = 1 
o 40 


Since the joint distribution function is everywhere continuous, the bound- 
aries between any two of these regions can be included in either one, and 
we can write 

0 forx = 0ory <0 

ixy(x + y) for0<x<1,0<y<1 

F(x,y) =4 3y(y + 1) forx>1,0<y<1 
ix(x + 1) for0<x<l,y21 
1 foxzliyzl A 


If the joint distribution function of x and y is given by 


5 аии for x > 0andy > 0 
(sy) = 0 elsewhere 


find Р(1 < x < 3,1 < y < 2). 
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Solution 


This probability may be obtained directly from the joint distribution function 
(see Exercise 15 on page 115), but let us find it here by first getting the 
joint probability density function and then integrating it over the appropriate 
region. Partial differentiation yields 


2 


F x, pes ety) 
өх ду (sy) 


for x > 0 and y > 0, and 0 elsewhere, so that the joint probability density 
of the two random variables is given by 


eU? фогх > 0andy > 0 
foy) = | |, 
0 elsewhere 
Now, the probability that x will take on a value on the interval from 1 to 


3 and y will take on a value on the interval from 1 to 2 is given by the 
double integral 


2 гз 
| | e n ахау = (e^! — e™°)(e™' — e?) 
1 1 


^2 3 5 


se“ te 


ll 
^ 
l 
^ 


0.074 A 


For two random variables, the joint probability density is, geometrically 
speaking, a surface, and the probability which we have calculated in Example 
3.17 is given by the volume under this surface shown in Figure 3.10. 

АП of the preceding definitions concerning two random variables can be 
generalized to the multivariate case, where there are n random variables. The 
values of the joint probability distribution of the discrete random variables X, 
X2,..., and x,, defined over the same sample space $, are given by 


f(x Xares i, Xn) = Р(х = x,X; = X,..., X, = х„) 


for each n-tuple (xj, x;,... » Xn) within the range of the random variables. Also 
the values of their joint distribution function are given by 


F(,32,--.,x,) = Р(х, S Xi, X; = Xas- ees Xn = Xn) 


for -0 < xı < 90, —00 < x, < 00,:..,-0 < x, < oo, 
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fix, y) 


=ix +y) 


f(x,y) =e 


Figure 3.10 Diagram for Example 3.17. 


EXAMPLE 3.18 


If the joint probability distribution of the discrete random variables x, y, and z 
is given by 


a 
flx,y,2) =F for x = 1,2;y = 1,2,3;2 = 1,2 
find P(x = 2,y +2 < 3). 


Solution 


P(x = 2,y + z < 3) = f(2,1,1) + f(2, 1,2) + fQ,2, 1) 


-átété 
H 
=8 a 


In the continuous case, probabilities are again obtained by integrating the 
joint probability density, and the joint distribution function is given by 


Fl, х-ы) = | , | f Да, to, . -> tn) dt, dtz- + + dt, 
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for -© < x, < 00, —00 < x, < 00,..., —00 < x, < ©. Also, partial differenti- 
ation yields the formula 


o" 


= —— — F(x, X2, ..., Xn) 
AX, AX2** 0X, (а, " 


fn, X25» -s Xn) 
wherever these partial derivatives exist. 
EXAMPLE 3.19 
If the trivariate probability density of x,, x;, and x; is given by 


(xy + х) е for0 < x, <1,0<x,<1,x,;>0 
Рх, X2, X3) = 
0 elsewhere 


find 
(a) P[(x,,%2,%3) є A], where A is the region 


(65,32,23)]0 < xy. € 5,3 < X2 < 1, x3 < 1} 


(b) the joint distribution function of the three random variables x,, x;, and x;. 


Solution 


(a) P[G9:,35,X) є A] 


P(0 < x, < 4,4 < х, <1,х; <1) 
пр 
as 
o J} Jo 

(ту я) 
x ыык КТ 

[16 Md. 


1 
=] 1е7% dx, 


о 


= 1(1-—е7!):= 0.158 


(b) It will be left to the reader in Exercise 19 on page 116 to use the results 
of Example 3.16 to show that the distribution function of the 
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random variables is given by 


F(x, X2, хз) 
0 for x, < 0, x, < 0,or x, < 0 
lxxx *x)(0-67*)  for0 < xı < 1,0 <x < 1,x,> 0 
={ 4x,(x, + 1)(1-е%) fox 2 1,0<х<1,х;>0 
4x,(x, + 1)(1 — e79) for0 < х < 1,02 1,3 > 0 
1-е for x, 2 1, х 2 1, x > 0 А 


THEORETICAL EXERCISES 
1. If x and y have the joint probabilities shown in the following table 


x 
0 1 2 
0. | dec ok 
oes i 3 
LATO ITE. 
3 | is 
find 
(а) P(x = 1,у = 2); (b P(x = 0,1 <y < 3); 
(с) Р(х+у < 1); (d Р(х > у). 


2. If the joint probability distribution of x and y is given by 


(ху) = соё ty) fox--,013y--523 


find 
(a) the value of c; 
(с) Р(х= 1,у> 2); 


3. Show that 


(b) P(x = 0,y < 2); 
(d Р(х> 2 - у). 


fixy) = Ю(2у - х) for x = 0,3; у = 0,1,2 


cannot serve as the joint probability distribution of two random variables for 


any value of k. 
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4, With reference to Exercise 1, find the following values of the joint distribution 


ч 


10. 


function of the two random variables: 
(a) Е(1.2,0.9); (b) F(,0); 
(c) F(-3,1.5); (d) F(4,2.7). 


. If the joint probability distribution of x and y is given by 


/(х,у) = (х + у) fox-0,1,23;» = 0,1,2 


construct a table showing the values of the joint distribution function of the 
two random variables at the twelve points (0, 0), (0, 1),..., (3, 2). 


. If F(x, y) isthe value of the joint distribution function of two discrete random 


variables x and y at (x, y), show that 

(a) Р(—оо, –оо) = 0; 

(b) F(00, оо) = 1; 

(c) ifa < b and c < d, then F(a, с) < F(b, d). 


. Determine k so that 


kx(x — y) fo0«x«l,-x«ycx 
0 elsewhere 


f(x, y) = { 


can serve as a probability density function. 


. If x and y have the joint probability density given by 


[ev for0<x<10<y<1,xt+y<1 
fœ») = l elsewhere 

find 

(a) the value of k; 

(b) Р(х+у < 2). 


. Ifthe joint probability density function of х and y is given by 


forx>0,y>0,x+y<1 
elsewhere 


л») = {2 


find 
(а) Р(х<3,у < 3); 
(с) Р(х > 2у). 


If the joint probability density of x and y is given by 


(b) P(x+y > 3); 


1 

ET for0 < x < 
л-р ЕС cs au 
0 elsewhere 


11. 


12. 


13. 


14. 


15. 


16. 


17. 
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find the probability that the sum of the values taken on by the two random 

variables will exceed }. 

Find the expressions for the values of the joint distribution function of the 

random variables x and y of Exercise 9 which apply when 

(a) x <0,ory <0; 

(b) x21, y 21; 

(с) х> 0, у> 0, х + y <1, and use it to verify the result of part (a) of 
Exercise 9. 

For how many other parts of the xy-plane will the joint distribution function 

of these random variables have to be given separately? 


If the joint distribution function of two random variables x and y is given by 


1-e)1-e) forx>0,y>0 
Fis») =] js еа 


elsewhere 


find Р(х < 1,y « 2). 

With reference to Exercise 12, find 

(a) the joint probability density of the two random variables; 

(b P(l<x<2,1<y <2). 

If the joint distribution function of two random variables x and y is given by 


l-e*-e’+e*” forx>0,y>0 
0 elsewhere 


F(x, y) E | 


find 

(a) the joint probability density of the two random variables; 

(b) Р(х+у>3). 

If F(x, y) is the value of the joint distribution function of two continuous 
random variables x and y at (x, y), express P(a < x < b,c < y d) in 
terms of F(a, c), F(a, d), F(b, c), and F(b, d). Observe that the result holds 
also for discrete random variables. 

Use the formula asked for in Exercise 15 to rework 

(a) Example 3.17 on page 109; 

(b) part (b) of Exercise 13. 


If the joint probability distribution of x, y, and z is given by 


/(х,у,2) = kxyz (огх = 1,2;у = 1,2,3;2 = 1,2 


find 

(a) the value of k; 

(b) Р(х = 1,у < 2,2 = 1); 
(с) Р(х = 2,у+2 = 4). 
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If F(x, y, z) is the value of the joint distribution function of these three 
random variables at (x, y, z), also find 

(d) F(2, 1,2); 

(e) F(1,0, 1); 

(f) Е(4,4,4). 


18. If the joint probability density of the three random variables x, y, and z is 


given by 

kxy(1 ~ z) for0< x<1,0<y<1,0<2z<1, 
У yz) = xty+z<1 

0 elsewhere 
find 


(a) the value of k; 
(b) P(x + y < 3). 


19. Verify the result of part (b) of Example 3.19. 


20. If the joint probability density of the three random variables x, y, and z is 
given by 


1 
_ f3(2x + 3y +2) for0<x<10<y<1,0<z<1 
fs y, 2) | elsewhere 


find 
(а) Р(х= },у = 3,2 = 3); 
(b) Р(х<3,у< 2,2 < 1). 


APPLIED EXERCISES 


21. Suppose that we roll a pair of balanced dice, x is the number of dice that 
come up 1, and y is the number of dice that come up 4, 5, or 6. 

(a) Draw a diagram like that of Figure 3.1, showing the values of x and y 

associated with each of the 36 equally likely points of the sample space. 


(b) Construct a table showing the values of the joint probability distribution 
of x and y. 


22. Two textbooks are selected at random from a shelf that contains 3 statistics 
texts, 2 mathematics texts, and 3 physics texts. If x is the number of statistics 
texts and y is the number of mathematics texts selected, construct a table 
showing the values of the joint probability distribution of x and y. 


23. Let x denote. the number of heads and y the number of heads minus the 
number of tails obtained in three tosses of a balanced coin. Find the values 
of the joint probability distribution of x and y. 
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24. A marksman is aiming at a circular target of radius 1. If we draw a rectangular 
system of coordinates with its origin at the center of the target, the coordinates 
of the point of impact, (x, y), are random variables having the joint probability 
density 


1 

= for0<x*+y?<1 
Ју) = т - ? 

0 elsewhere 


Find 

(a) P[(x,y) € A], where A is the sector of the circle in the first quadrant 
between the radii along the lines y = 0 and y = x; 

(b) P[(x, y) є B], where B = {(х,у)|0 < х? + ^ < 3. 

25. A certain college gives aptitude tests in the sciences and the humanities to 
all entering freshmen. If x and y are, respectively, the proportions of correct 
answers a student gets on the tests, the joint distribution of these random 
variables can be approximated with the joint probability density 


2 
€ 3(2x + 3y) for0<x<1,0<y<1 
fy) = |} elsewhere 


What proportions of the students will get 


(a) less than 0.4 on both tests; 
(b) morethan 0.8 on the science test and less than 0.5 on the humanities test? 


26. If p, the price of a certain commodity (in dollars), and s, total sales (in 
10,000 units), are random variables whose joint distribution can be approxi- 
mated with the joint probability density 


h {or for 0.20 < p < 0.40,5 > 0 
fips) = 0 elsewhere 


find the probabilities that 
(a) the price will be less than 30 cents and sales will exceed 20,000 units; 
(b) the price will be between 25 cents and 30 cents and sales will be less 


than 10,000 units. 


36 MARGINAL DISTRIBUTIONS 


To introduce the concept of a marginal distribution, let us consider the following 
example: 
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EXAMPLE 3.20 


In Example 3.12 on page 103 we derived the joint distribution of the random 
variables x and y, the number of aspirin tablets and the number of sedative 
tablets included among two tablets drawn at random from a bottle containing 
three aspirin, two sedative, and four laxative tablets. Find the probability distribu- 
tion of x alone and that of y alone. 


Solution 


The results of Example 3.12 are shown in the following table, together with 
the marginal totals, that is, the totals of the respective rows and columns: 


E 
Ra; 
o VERAT EU Ig 
ТР NOE й 
ERS k 
Vr SEEE. 


Observe that the numbers in the bottom margin are, in fact, the probabilities 
that the random variable x will take on the values 0, 1, and 2. In other 
words, the column totals give the values g(x) of the probability distribution 
of x; that is, 


g(0) = f(0,0) + /(0,1) + (0,2) = È f(O, y) 
y=0 

#01) = f(1,0) + f0,1) + f,2) = Ў fü, 
y-0 

20) = f(2,0) + f(2,1) + f(2,2) = Y fa, 


y-0 


More compactly, 


a(x) = Ey) бех =0,1,2 


Sec. 3.6.: Marginal Distributions 119 


and by the same token, the values h(y) of the probability distribution of y 
are given by the row totals 


2 
Һу) = L fy) fory=0,1,2 А 


We are thus led to the following general definition: 


DEFINITION 3.10 If x and y are discrete random variables and f(x, y) is 
the value of their joint probability distribution at (x, y), the function given 
by 


gx) = fos») 


for each x within the range of x, is called the marginal distribution of x. 
Correspondingly, the function given by 


Һу) = УХ, у) 


for each y within the range of у, is called the marginal distribution of y. 


When x and y are continuous random variables, the probability distributions are 
replaced by probability densities, the summations are replaced by integrals, and 
we get 


DEFINITION 311 If x and y are continuous random variables and f(x, y) 
is the value of their joint probability density at (x, y), the function given by 


g(x) = | f(x,y) dy ог-0 < x < co 
is called the marginal density of x. Correspondingly, the function given by 
n= | Хх, у) ах  for-o < y «o 


is called the marginal density of y. 
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EXAMPLE 3.21 
Given the joint density 


$x-*2y for0<x<1,0<y<1 
0 elsewhere 


fy) = | 
find the marginal densities of x and y. 


Solution 


Performing the necessary integrations, we get 
oo 1 
8(х) = ] Хо, y) dy = | {х + 2y) dy = {х + 1) 
-%0 о 
for 0 < x < 1, and g(x) = 0 elsewhere. Likewise, 
oo 1 
h(y) = | f(x, у) dx = | 3(x + 2») dx = 31 4y) 
-0 о 


for 0 < y < 1, and h(y) = 0 elsewhere. A 


When we are dealing with more than two random variables, we can speak 
not only of the marginal distributions of the individual random variables, but 
also of the joint marginal distributions of several of the random variables. If the 


joint probability distribution of the discrete random variables Xi, Xo, .. 


has the values f(x,, X2, ..., Xn), the marginal distribution of х; alone is given by 


BOIS Linn DAs, wees td 


for all values within the range of x,, the joint marginal distribution of x,, x2, 


and x; is given by 


m(X), X2, x3) = à MAC AS Xas.» -y Xn) 


Xs 


for all values within the range of x,, x;, and x;, and other marginal distributions 
can be defined in the same way. For the continuous case, probability distributions 
are replaced by probability densities, summations are replaced by integrals, and 
if the joint probability density of the continuous random variables Xi, Х2,... 
and x, has the values’ f 06, X2,...,%,), the marginal density of x, alone is given 
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by 


h(x;) - | -f Долу ауу» Me) Oxy di $e de, 


-œ 


for —оо < x, < оо, the joint marginal density of x, and x, is given by 
© 


Plx, Xn) = | 


E 


f f 05,32, -+s X4) dx; dx, * ` * dlni 
for -œ < x, < оо and —oo < x, < ©, and so forth. 


EXAMPLE 3.22 
Considering again the trivariate density 


(x, + x)e^ for0 <x, < 1,0 < x} < 1,x% > 0 
0 elsewhere 


Хоа, ж, X3) = { 


of Example 3.19, find the joint marginal density of x, and х;, and the marginal 
density of x;. 
Solution 


Performing the necessary integration, we find that the joint marginal density 
of x, and x; is given by 


m(x,, X3) = f (x; + х) e ^ dx, = (xı + De^ 


LU 


for 0 < x, < 1 and x; > 0, and m(x,, x3) = 0 elsewhere. Using this result, 
we find that the marginal density of x, alone is given by 


оо 1 oo 
g(x) = | | fa, X25 хз) dx; dx, = [ m(x;, хз) dx; 


0 0 


- | (x + е7% dx; = xı +3 
0 


for 0 < x, < 1, and g(xi) = 0 elsewhere. A 


Corresponding to the various marginal and joint marginal distributions and 
densities we have introduced in this section, we can also define marginal and 
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joint marginal distribution functions. Some problems relating to such distribution 
functions will be left to the reader in Exercises 5, 10, and 11 on pages 129 and 130. 


37 CONDITIONAL DISTRIBUTIONS 


In Chapter 2 we defined the conditional probability of event A given event B as 


P(A n B) 


P(AIB) = гесу 


provided P(B) # 0. Suppose now that A and B are the events x = x and y = y, 
so that we can write 
P(x = ҳу = у) 
Р(у = у) 
_ fos y) 
h(y) 


P(x = xly = у) = 


provided P(y = y) = h(y) # 0, where f(x, y) is the value of the joint probability 
distribution of x and y at (x, y) and h(y) is the value of the marginal distribution 
of y at y. Denoting the conditional probability by f(x|y) to indicate that x is a 
variable and y is fixed, let us now make the following definition: 


DEFINITION 3.12 If f(x, y) is the value of the joint probability distribution 
of the discrete random variables x and y at (x, y) and h(y) is the value of 
the marginal distribution of y at y, the function given by 


_ f(xy) 
f(xly) = VOX h(y) #0 


for each x within the range of x, is called the conditional distribution of x 
given y — y. Correspondingly, if g(x) is the value of the marginal distribu- 
tion of x at x, the function given by 


fos у) 


sco 2000 


w(y|x) = 


for each y within the range of y, is called the conditional distribution of y 
given x — x. 
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EXAMPLE 3.23 


Referring to Example 3.20, find the conditional distribution of x given y = 1. 


Solution 


Substituting the appropriate values from the table on page 118, we get 


2 

$ 4 

0)n-27-7- 

До) =т=т 
RNA 

fa) =т=т 

fal) = 4 = 0 A 


= 


When x and y are continuous random variables, the probability distributions 
are replaced by probability densities, and we get 


DEFINITION 3.13 If f(x, y) is the value of the joint density of the continuous 
random variables x and y at (x, y) and h(y) is the value of the marginal 
density of y at y, the function given by 


ra= hy) #0 


for -œ < х < ©, is called the conditional density of x given y = y. Corre- 
spondingly, if g(x) is the value of the marginal density of x at x, the function 


given by 


f(x,y) 
g(x) 


w(ylx) = g(x) #0 


for -o < y < 9o, is called the conditional density of y given x = х. 


EXAMPLE 3.24 


Referring to Example 3.21, find the conditional density of x given y = у, and 
üse it to evaluate the probability Р(х = Щу = 3). 
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Solution 
Using the results obtained on page 120, we have 


Лоу) _ 30 + 2у) 
fet?) = уу 7 3a cy) 


"22x hay 

1+ 4у 
2x +4-} 
for 0 < x <1, and f(x|y) = 0 elsewhere. Now, fx) = E 
T 


2x+2 
Te and we can write 


It is of interest to note that in Figure 3.11 this probability is given by the 
ratio of the area of trapezoid ABCD to the area of the trapezoid 


AEFD. A 


x 


Figure 3.11 Diagram for Example 3.24. 
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EXAMPLE 3.25 
Given the joint density function 


4xy for0<x<1,0<y<1 
0 elsewhere 


AEA = { 
find the marginal densities of x and y, and the conditional density of x given y = у. 


Solution 


Performing the necessary integrations, we get 


g(x) = | f(x, y) dy = | 4xy dy 


yai 
=2x 


y=0 


= 2xy? 


for 0 < x < 1, and g(x) = 0 elsewhere; also 


© 1 
h(y) = [ f(x, y) dx = [ 4хуах 


х=1 


= 2x’y 
х=0 

for0 < y < 1, and Һу) = 0 elsewhere. Then, substituting into the formula 
for a conditional density, we get 


Jody) e 


2x 
h(y) 27 


f(x) = 
for 0 < x < 1, and f(xly) = 0 elsewhere. A 


When we are dealing with more than two random variables, whether con- 
tinuous or discrete, we can consider various different kinds of conditional distribu- 
tions or densities. For instance, if f(31, X2, Хз, x,) is the value of the joint 
distribution of the discrete random variables Xi, X2, Xs; and x, at (xi, X2, X3, Xa), 
we can write ` 


fo, X23 Хз» Xa) 


= ху, X2, X4) # 0 
рбхзјха, x», ха) g(x, X25 Xa) gx X2 4) 
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for the value of the conditional distribution of x, given x, = ху, X; = X2, and 
X4 = X4, at хз, where g(xi, X2, X4) is the value of the joint marginal distribution 
of xi, X2, and x, at (ху, X2, x4). We can also write 


f. X2, X3, X4) 


m(x, x3) # 0 
m(X;, ху) 


4(х», х]х\, xs) = 


for the value of the joint conditional distribution of x; and x, given x, — x; and 
ху = X, at (X2, x4), ог 


/(х\, X2, ху, ха) 


b(x,) b(x,) #0 


r(X2, ху, ху) = 


for the value of the joint conditional distribution of x;, хз, and x, given x, = xi, 
at (x2, X3, ха). 

When we are dealing with two or more random variables, questions of 
independence are usually of great importance. In Example 3.25 we see that 
f(x|y) = 2x does not depend on the given value y = y, but this is clearly not 

~ + 
the case in Example 3.24 where f(x|y) = rn Whenever the values of the 
conditional distribution of x given y = y do not depend on y, it follows that 
f(x|y) = g(x), and hence the formulas of Definitions 3.12 and 3.13 yield 


S(x,y) = f(xly) + (у) = g(x) - hQ) 


That is, the values of the joint distribution are given by the products of the 
corresponding values of the two marginal distributions. Generalizing from this 
observation, let us now make the following definition: 


DEFINITION 3.14 If f(x, x5, ..., Xn) is the value of the joint probability 
distribution of the п discrete random variables X; X5...,X, at 
(x1, X2,..., Xn), and /(х,) is the value of the marginal distribution of x, at 
x, for i = 1, 2,..., n, these random variables are independent if and only if 


fn; X3, <2 6 Xn) = ЛО) + fala) >... + f 0) 


for all (xi, x2, ..., Xn) within their range. 


To give a corresponding definition for continuous random variables, we s 
substitute the word "density" for the word "distribution." 


i 


Sec. 3.7.: Conditional Distributions 127 


With this definition of independence, it can easily be verified that the three 
random variables of Example 3.22 are not independent, but that the two random 
variables x, and x;, and also the two random variables x; and х;, are pairwise 
independent (see Exercise 12 on page 130). 

The following examples serve to illustrate the use of Definition 3.14 in 
finding probabilities relating to several independent random variables: 


EXAMPLE 3.26 


Considering n flips of a balanced coin, let x, be the number of heads (0 or 1) 
obtained on the first flip, x; the number of heads obtained on the second flip,..., 
and x, the number of heads obtained on the nth flip. Find the joint probability 
distribution of these n random variables. 


Solution 
Each random variable x; for i = 1,2, ... , n, has the probability distribution 


ANA for x, = 0,1 
f(x) = k elsewhere 


Since the n random variables are independent, their joint probability distri- 
bution is given by 


Ухо) = ЛО). fi +--+ f) 


where each x can take on the value 0 or 1. A 


EXAMPLE 3.27 


Given the independent random variables X,, хз, and X; 
densities 


with the probability 


e^ for x, > 0 

Ло) = 0 elsewhere 
he зе 29 + forx, > 0 

fi) = |o elsewhere 
Ж зе?» for x, > 0 

As) = 1o elsewhere 
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find their joint probability density, and use it to evaluate the probability 
P(x, + % < 1,%3 > 1). 


Solution 
According to Definition 3.14, the values of the joint probability density are 
given by 
Хбх, ж, %3) = Ла)" fx * Р(х) 
ua ор Зе"? 
26e 725735 
for ху > 0, x2 > 0, x, > 0, and f(x, X2, хз) = 0 elsewhere. Thus, 


o (1 (1-х, 
Р(х, +% < 1,x% > 1) = | [ | 6e7*7*:7?^5 dx, dx; dx; 
1 о 40 


2(1-2e€'* e7)e? 


= 0.020 A 


THEORETICAL EXERCISES 


1. If the values of the joint probability distribution of x and y are as shown in 
the following table 


SENI 
stc j 
x50 1:30 4 
jen 0 


find 
(a) the marginal distribution of x; 
(b) the marginal distribution of y; 
(c) the conditional distribution of x given y = —1. 
2. With reference to Exercise 1 on page 113, find 
(a) the marginal distribution of x; 
, (b) the marginal distribution of y; 
(c) the conditional distribution of x given y — 1; 
(d) the conditional distribution of y given x — 0. 
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3. Given the joint probability distribution 
fo» 2) = Ae forx = 1,2,3; у = 23; 2-12 


find 
(a) the joint marginal distribution of x and y; 
(b) the joint marginal distribution of x and z; 
(c) the marginal distribution of x; 
(d) the conditional distribution of z given x = 1 and y = 2; 
(e) the joint conditional distribution of y and z given x = 3. 
4. Check whether the random variables x and y are independent, if their joint 
probability distribution is given by 
(а) f(x,y) = 4 for x = -1 and y.=-1, x= 1 and y = 1, x =1 and 
y = -1,and x = 1 and y = 15 
(b f(x,y) = lforx = 0andy = 0,х = Qand y = 1,апіх = landy = 1. 
5. With reference to Example 3.20 оп page 118, find 
(a) the marginal distribution function of x, namely, the function given by 
G(x) = P(x = x) for -oo < x < оо, 
(b) the conditional distribution function ofxgiveny = 1, namely the function 
given by Р(х|1) = Р(х = х|у = 1) for - o < x < ©. 
6. If the joint density function of x and y is given by 


fe 9 = [ier * n for0<x<1,0<y<2 
0 


elsewhere 


find 

(а) the marginal density of x; 

(b) the marginal density of y; 

(c) the conditional density of x given y — 
(d) the conditional density of y given X — 


7. If the random variables x and y have the joint density function given by 


am = 


juu bor forx>0,y>0x+y<1 
(мо) 0 elsewhere 


find 

(a) the marginal density of x; 

(b) the marginal density of y; 

Also determine whether the two random variables are independent. 
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© 


10. 


11. 


12. 


13. 


With reference to the joint density of Exercise 10 on page 114, find 
(a) the marginal density of x; 

(b) the marginal density of y. 

Also determine whether the two random variables are independent. 
With reference to Example 3.22 on page 121, find 

(a) the conditional density of x; given x, — 1 and x, = 2; 

(b) the joint conditional density of x; and x; given x, = 5. 


If F(x, y) isthe value of the joint distribution function of the random variables 
x and y at (x, y), show that the marginal distribution function of x is given by 


G(x) = F(x, o) for -œ < x < o 


Use this result to find the marginal distribution function of x for two random 
variables x and y having the joint distribution function of Exercise 12 on 
page 115. 


If F(x,, X2, x3) is the value of the joint distribution function of the random 
variables ху, X2, and x; at (x,, X2, хз), show that the joint marginal distribution 
function of x, and x; is given by 


M(x, x3) = F(x1, ©, хз) for -© < x, < ©,-0 < х, < © 
and that the marginal distribution function of x, is given by 
G(x,) = F(x, ©, со) for -œ < x, < © 


Use these results to find the joint marginal distribution function of x, and 
хз, and the marginal distribution function of x,, for three random variables 
х, х, and x; having the joint distribution function obtained in Example 
3.19 on page 112. 


With reference to Example 3.22 on page 121 verify that the three random 
variables are not independent, but that the two random variables x, and Xs, 
and also the two random variables x; and x;, are pairwise independent. 


If x and y have the joint density function 


ope cm foc x «2,22«y «4 
; 0 elsewhere 


determine the value of P(x « ily < 3). 


M. 
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Given the independent random variables x and y with the probability densities 


2 fo0«x«2 
cà 
х) = 
fe) | elsewhere 
! ^ fo0cyc3 
AA 3 
TO= B elsewhere 


find 
(a) the joint probability density of x and y; 
(b) the probability Р(х? + y > 1). 


APPLIED EXERCISES 


15. 


16. 


17. 


Two cards are drawn without replacement from an ordinary deck of 52 
playing cards. If z is the number of aces obtained in the first draw and w is 
the total number of aces obtained in both draws, find 

(a) the joint probability distribution of z and w; 

(b) the marginal distribution of z; 

(c) the marginal distribution of w; 

(d) the conditional distribution of w given z — 1. 

With reference to Exercise 22 on page 116, find 

(a) the marginal distribution of x; 

(b) the marginal distribution of y; 

(c) the conditional distribution of x given y — 1; 

(d) the conditional distribution of y given x — 0. 


If x is the proportion of persons who will respond to one kind of mail-order 
solicitation, y is the proportion of persons who will respond to another kind 
of mail-order solicitation, and the joint probability density function of x and 
y is given by 


Ax+4y) ford<x<1,0<y<1 


Joy B 


elsewhere 


find 

(a) the marginal density of x; 

(b) the probability that there will be at least a 30 percent response to the 
first kind of mail-order solicitation; 

(c) the conditional density of y given X — X; 

(d) the probability that there will be at most a 50 percent response to the 
second kind of mail-order solicitation given that there has been only a 
20 percent response to the first kind of mail-order solicitation. 
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18. 


19. 


20. 


21. 


With reference to Exercise 26 on page 117, find 

(a) the marginal density function of p; 

(+) the probability that the price per unit will be less than 28 cents; 

(c) the conditional density function of s given p = p; 

(d) the probability that sales will be less than 30,000 units when p = 25 cents. 


If x is the amount (in dollars) a salesperson spends on gasoline during a day 
and y is the amount (in dollars) for which the salesperson is reimbursed, 
and the joint density of these random variables is given by 


УЙ! = 
20 *) for 10< x «2,7 < y < x 


S(x,y) = a 


0 elsewhere 


find 


(a) the marginal densities of x and y; 

(b) the conditional density of y given x = 12; 

(c) the probability that the salesperson will be reimbursed at least $8 when 
spending $12. 


Show that the two random variables of Exercise 25 on page 117 are not 
independent. 


The useful life (in hours) of a certain kind of vacuum tube is a random 
variable having the probability density 


20,000 
f(x) = 4 (x + 100) 
0 elsewhere 


forx > 0 


If three of these tubes operate independently, find 

(a) the joint probability density function of x,, x;, and x;, representing the 
lengths of their useful lives; 

(b) the probability P(x, < 100, x; < 100, x, = 200). 
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Mathematical Expectation 


INTRODUCTION 


Originally, the concept of a mathematical expectation arose in connection with 
games of chance, and in its simplest form it is the product of the amount a player 
stands to win and the probability that he or she will win. For instance, if we hold 
one of 10,000 tickets in a raffle for which the grand prize is a car worth $4,800, 


our mathematical expectation is 4,800 - uus — $0.48. This figure will have to 


be interpreted in the sense of an average—altogether the 10,000 tickets pay $4,800, 
,8 

10,000 

If there is also a second prize worth $1,200 and a third prize worth $400, 

we can argue that altogether the 10,000 tickets pay $4,800 + $1,200 + $400 = 


or on the average 


= $0.48 per ticket. 


A00 
$6,400, or on the average ms = $0.64 per ticket. Looking at this in a different 


way, we could argue that if the raffle is repeated many times, we would lose 99.97 
percent of the time (or with probability 0.9997) and win each of the prizes 0.01 
percent of the time (or with probability 0.0001). On the average we would thus win 


0(0.9997) + 4,800(0.0001) + 1,200(0.0001) + 400(0.0001) — $0.64 
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which is the sum of the products obtained by multiplying each amount by the 
corresponding probability. 
4 


42 THE EXPECTED VALUE 
OF A RANDOM VARIABLE 


In the illustration of the preceding section, the amount we stood to win was a 
random variable, and the mathematical expectation of this random variable was 
the sum of the products obtained by multiplying each value of the random variable 
by the corresponding probability. Referring to the mathematical expectation of 

-a random variable simply as its expected value, and extending the definition to 
the continuous case by replacing the operation of summation by integration, we 
thus have 


DEFINITION 41 If x is a discrete random variable and f(x) is the value of 
its probability distribution at x, the expected value of this random variable 
is 


E(x) = Ex: f(x) 


Correspondingly, if x is a continuous random variable and f(x) is the value 
of its probability density at x, the expected value of this random variable is 


E(x) = ("= x + f(x) dx 


L. с MM 


In this definition it is assumed, of course, that the sum or integral exists; otherwise, 
the mathematical expectation does not exist. 


EXAMPLE 4.1 


A lot of twelve television sets includes two that are defective. If three of the sets 
are chosen at random for shipment to a hotel, how many defective sets can they 


expect? 
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Solution 


EXAMPLE 4.2 


We can select x of the 2 defective sets and 3 — x of the 10 good sets in 


2 1 z12 : 
(X d ways, and we can select 3 of the 12 sets in ( 3 ) ways. Assuming 


12 
that the ( 3 ) possibilities are all equally likely, we find that the probability 


distribution of x, the number of defective sets shipped to the hotel, is given 
by 


forx = 0,1,2 


or, in tabular form, 


Now, 
E(x) =0-§+1-3+2-4=3 


and since they cannot possibly get half a defective set, it should be clear 
that the term "expect" is not used in its colloquial sense. Indeed, it must 
be interpreted as an average pertaining to repeated shipments made under 
the given conditions. a 


Certain coded measurements of the pitch diameter of threads of a fitting have 
the probability density 


4 
f(x) 24 п(1 + x?) 
0 elsewhere 


fo0cx«1 


Find the expected value of this random variable. 
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Solution 


sing Definition 4.1, we have 


: 4 
E(x) = [ рип уе 
4 
т 


In many problems of statistics we are interested not only in the expected 
value of a random variable x, but also in the expected values of random variables 
related to x. Thus, we might be interested in the random variable y, whose values 
are related to those of x by means of the equation y — g(x). To simplify our 
notation we denote this random variable by g(x); for instance, g(x) might be x’, 
so that when x takes on the value 2, g(x) takes on the value 8. If we want to 
find the expected value of such a random variable g(x), we could first find its 
probability distribution or probability density (by methods to be discussed in 
Chapter 7) and then use Definition 4.1, but it is usually easier and more straightfor- 
ward to use the following theorem: 


THEOREM 41 If x is a discrete random variable and f(x) is the value of 
its probability distribution at x, the expected value of the random variable 
g(x) is given by 


E[g(x)] = X6) : f(x) 


Correspondingly, if x is a continuous random variable and f(x) is the value 
of its probability density at x, the expected value of the random variable 
g(x) is given by 


E[g(x)] = [ g(x) + f(x) dx 


Proof. Since a more general proof is beyond the scope of this text, 
we shall prove this theorem here only for the case where x is discrete and 
takes on a finite set of values. Since y — g(x) does not necessarily define 
a one-to-one correspondence, suppose that g(x) takes on the value g, when 
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x takes on the values x, xi, ..., Xin. Then the probability that g(x) will 
take on the value g, is м 


P[g(x) = g] = Ў Лау) 


and if g(x) takes on the values g;, g;,..., gm, it follows that 


E[g(3)] = Я gi Ple(x) = gi] 


i= 


= Ya Y f(x) 
1 je 


i= 


È Ў a Soy) 
У a(x) - f(x) 


where the summation extends over all values of x. v 
EXAMPLE 4.3 


If x is the number of points rolled with a balanced die, find the expected value 
of the random variable g(x) = 2x? + 1. 


Solution 
Since each possible outcome has the probability 2, we get 
6 
E[g()) = Y QX +1) -3 

x=1 

=(2-P+1)-§+-+-+(2-641)-3 

CAEN 

EXAMPLE 44 


If x has the probability density 


es fent oa ӨГ» c0 
A у elsewhere 


find the expected value of the random variable g(x) = e, 
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Solution 
According to Theorem 4.1, we have 
E 


E[e*/4] = | et ot dx 


о 


LU 
A 
» 


The determination of mathematical expectations can often be simplified by 
using the following theorems, which enable us to calculate expected values from 
other known or easily computed expectations. Since the steps are essentially the 
same, some proofs will be given either for the discrete case or the continuous 
case; others are left for the reader as exercises. 


THEOREM 42 If a and b are constants, then 


E(ax + b) = aE(x) + b 


Proof. Using Theorem 4.1 with g(x) = ax + b, we get 
E(ax + b) -Í (ax + b) · f(x) dx 


-af x: fo) de +b | дда 


-% 


aE(x)+b v 


0 and a = 0, it follows from Theorem 4.2 that 


If we set, respectively, b 


COROLLARY 1 If a is a constant, then 


E(ax) = aE(x) 


COROLLARY 2 . If b is a constant, then 


E(b)- b 
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Observe that if we write E(b), the constant b may be looked upon as a random 
variable which always takes on the value b. 


THEOREM 43 If c&,,c;,..., and c, are constants, then 


e| » 809] - Ў «Eg. 9) 


n 


Proof. According to Theorem 4.1 with g(x) = У g(x), 
1 


2 
i= 


|} сабоо] = z| È оо 


Ў Хов) 


Ў ахво) 
= LB) v 


EXAMPLE 4.5 


Making use of the fact that EGO) = (1° + 22 + 32 + 42 + 5 + 62). 1 = 9 
rework Example 4.3. 


Solution 


Е(2х? + 1) -2b(Q0)*122.92 41-29 4 
EXAMPLE 4.6 | 
If the probability density of x is given Ьу 


Оа x) for0<x<1 
fe $ elsewhere 


r 2 | 
show that E(x') = (r*)032' and use this result to evaluate E[(2x + 1)?]. 
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Solution 
1 1 
E(x’ -f x" 2(1 -уа=2[ (х= x'*!) dx 
0 0 
A 2( AW ы ) s 2 
rtl т+2/ (r+1)(r +2)’ 
Since 
Е[(2х + 17] = 4E(x?) + 4E(x) + 1 
2 1 2 1 
d E(x) ——-- 2) 2 i 
an (x) "IY: 3 and EG) du g We get 
Е[(2х + 1)]=4:1+4:1+1=3 А 
EXAMPLE 4.7 
Show that 
E[(ax + b)"] = X (aee 
i20 
Solution 


According to Theorem 1.9, we can write 
n С п п-ірі 
(ах + Б)" = У 1 (ax)""'b 
i=0 


and it follows that 


[Een 


E (Marie) Ki 
iso NE 


E[(ax + b)"] 


It 


The concept of mathematical expectation can easily be extended to situations 
involving more than one random variable. For instance, if z is the random variable 
whose values are related to those of the two random variables x and y by means 
of the equation z = g(x, y), it can be shown that 
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THEOREM 44 If x and y are discrete random variables and f(x, y) is the 
value of their joint probability distribution at (x, y), the expected value of 
the random variable g(x, y) is given by 


Е[в(х, y)] = XX g(x, у). f(x y) 


Correspondingly, if x and y are continuous random variables and f(x, у) 
is the value of their joint density at (x, y), the expected value of the random 
variable g(x, y) is given by 


Else vi = | | g(x, у)/(х, y) dx dy 


-%0 


Generalization of this theorem for functions of any finite number of random 
variables is straightforward. 


EXAMPLE 4.8 
With reference to Example 3.12 on page 103, find the expected value of g(x, y) = 
х +у. 
Solution 
2 2 
E(x + y) = у, Lx + у). f(x, у) 
x20 y= 
= (0+0) 5+ (041) -5+ (0+ 2) + (1+0) - 3 
+ (1+1) 4+ (2+0): 
= 4 
EXAMPLE 4.9 


If the joint density function of x and y is given by 


CSS 0х + 2у) for0<x<li<y<2 
у 0 elsewhere 


find the expected value of g(x,y) = х/у?. 
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Solution 


2f1 
E(x/y) - | | EED к 
1 Jo $ 


The following is another theorem which finds useful applications in sub- 
sequent work. It is a generalization of Theorem 4.3, and its proof parallels the 
one of that theorem. 


THEOREM 4.5 If c&,c;,..., and c, are constants, then 


e| egi, Xo, . . - а] m i CEL gi(X1, X2,- -3 Xk)] 


THEORETICAL EXERCISES 


1. To illustrate the proof of Theorem 4.1 with an example, consider the random 
variable x which takes on the values —2, —1, 0, 1, 2, and 3 with the respective 
probabilities /(—2), f(—1), f(0), f(1), f(2), and f(3). If g(x) = х?, find 
(a) the four possible values g;, g2, 3, and g, of g(x); 

(b) the probabilities P[g(x) = g;] for i = 1, 2, 3, 4; 
4 

(c) E[g(x)] = X g^ Р[2(х) = #1], and show that it equals У g(x) - f(x). 
i-1 x 


. Prove Theorem 4.2 for the discrete case. 

. Prove Theorem 4.3 for the continuous case. 

. Prove Theorem 4.5 for the discrete case. 

. Given the two continuous random variables x and y, use Theorem 4.4 to 
express E(x) in terms of 
(a) the joint density of x and y; 
(b) the marginal density of x. 

6. Find the expected value of the random variable x having the probability 


x for x = -1,0,1,3. 


OF WH 


distribution f(x) = 


144 Chap. 4: Mathematical Expectation 


7. Find the expected value of the random variable y whose probability density 
is given by 


sina et» for2<y<4 
n 0 elsewhere 


8. Find the expected value of the random variable x whose probability density 


is given by 
x forO<x<1 
f(x) =42-—x forl <x <2 
0 elsewhere 


9. The random variable x takes on the values 0, 1, 2, and 3 with respective 
probabilities of тз, 12, #&, and $5. 
(a) Find E(x) and E(x). 
(b) Use the results of part (a) to find E[(3x + 2)?]. 


10. The density function of the continuous random variable x is given by 


1 
f(x) = 4 x(In 3) 
0 elsewhere 


for1 <x <3 


(a) Find E(x), E(x), and E(x’). 
(b) Use the results of part (a) to find the value of E(x? + 2x? — 3x + 1). 
11. If the density function of the random variable x is given by 


z ford<x<1 
5 ог х= 
1 forl<x<2 
f(x) =} 2 Jg 
3- 
E for2<x<3 
0 elsewhere 


find the expected value of g(x) = x? — 5x + 3. 
12. With reference to Exercise 5 on page 114, find E(2x — y). 
13, With reference to Exercise 10 on page 114, find E(x/y). 


14. 


15. 


16. 
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If x, y, and z have the joint probability distribution of Exercise 17 on page 
115, find the expected value of the random variable u = x + y + z. 


If x, y, and z have the joint probability density of Exercise 20 on page 116, 
find the expected value of the random variable w — x^ — yz. 

If x has the probability distribution f(x) = È)" for x = 1, 2, 3,..., show 
that E(2*) does not exist. This is the famous Petersburg paradox, according 
to which a player's expectation is infinite (does not exist) if he is to receive 
2* dollars when, in a series of flips of a balanced coin, the first head appears 
on the xth ftip. 


APPLIED EXERCISES 


17. 


18. 


19. 


20. 


21. 


The probability that Ms. Brown will sell a piece of property at a profit of 
$3,000 is $, the probability that she will sell it at a profit of $1,500 is 55, the 
probability that she will break even is J, and the probability that she will 
lose $1,500 is 35. What is her expected profit? 


A game of chance is considered fair, or equitable, if each player's expectation 
is equal to zero. If someone pays us $10 each time we roll a 3 or a 4 with a 
balanced die, how much should we pay that person when we roll a 1, 2, 5, 
or 6 to make the game equitable? 

The manager of a bakery knows that the number of chocolate cakes he can 
sell on any given day is a random variable having the probability distribution 
f(x) = & for x = 0, 1, 2, 3, 4, and 5. He also knows that there is a profit of 
$1.00 for each cake which he sells and a loss (due to spoilage) of $0.40 for 
each cake he does not sell. Assuming that each cake can be sold only on the 
day it is made, find the baker's expected profit for a day on which he bakes 
(a) 3 of the cakes; 

(b) 4 of the cakes; 

(c) 5 of the cakes. 

If a contractor’s profit on a construction job can be looked upon as a 
continuous random variable having the probability density 


fix) = (ek +1) for-1<x<5 
0 elsewhere 


where the units are in thousand dollars, what is his expected profit? 


With reference to Exercise 19 on page 102, what tread wear can a car owner 
expect to get with one of the tires? 


. With reference to Exercise 20 on page 102, what is the city’s expected water 


consumption for any given day? 
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23. .With.reference to Exercise 26 on page 117, find E (ps), the expected receipts 
for the commodity. 


24. Mr. Adams and Ms. Smith are betting on repeated flips of a coin. At the 
start of the game Mr. Adams has a dollars, Ms. Smith has b dollars, at each 
flip the loser pays the winner one dollar, and the game continues until either 
player is "ruined." Making use of the fact that in an equitable game ^^ch 
player's mathematical expectation is zero, find the probability that Mr. Adams 
will win Ms. Smith's 6 dollars before he loses his a dollars. 


43 MOMENTS 


Among the mathematical expectations that are of special importance in statistics, 
there are the moments of the distribution of a random variable, or simply the 
moments of a random variable. 


DEFINITION 42 The rth moment about the origin of the randoni variable 
X, denoted by д}, is the expected value of x’; symbolically, 


ш = E(x’) = ух". f(x) 


for r = 0, 1, 2, 3,..., when x is discrete, and 


© 


и = E(x’) = | x" f(x) dx 


when x is continuous. 


It is of interest to note that the term "moment" comes from the field of 
physics—if the quantities f(x) in the discrete case were point masses acting 
perpendicularly to the x-axis at distances x from the origin, ш! would be the 
x-coordinate of the center of gravity, namely, the first moment divided by 
Y f(x) = 1, and 4; would be the moment of inertia. This also explains why the 
moments ш, are called moments about the origin—in the analogy to physics, the 
length of the lever arm is in each case the distance from the origin. The analogy 
applies also in the continuous case, where ш; and и» might be the x-coordinate 
of the center of gravity and the moment of inertia of a rod of variable density. 

When r = 0, we have шо = E(x°) = E(1) = 1, by Corollary 2 of Theorem 
4.2, and this is as it should be in accordance with Theorems 3.1 and 3.5. When 
r = 1, we have шу = E(x), which is just the expected value of the random 
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variable x itself; in view of its importance in statistics, we give it a special symbol 
and a special name. 


DEFINITION 43 4; is called the mean of the distribution of x, or simply the 


mean of x, and it is denoted by д. 


The special moments we shall define next are of importance in statistics 
because they serve to describe the shape of the distribution of a random variable, 
namely, the shape of the graph of its probability distribution or probability density. 


D 
DEFINITION 44 The rth moment about the mean of the random variable x, 


denoted by y,, is the expected value of (x — и)”; symbolically, 


ш, = E((x - и)']= L(x - ш)": f(x) 
for r = 0, 1, 2, 3,..., when x is discrete, and 


ш = Е[(х – ш" = [ (x — u)": f(x) dx 


when x is continuous. | 


Note that uo = 1 and д, = 0 for any random variable for which ш exists (see 
Exercise 1 on page 156). 

The second moment about the mean is of special importance in statistics 
because it is indicative of the spread or dispersion of the distribution of a random 
variable; thus, it is given a special symbol and a special name. 


DEFINITION 45 4; is called the variance of the distribution of x, or simply 
the variance of x, and it is denoted by а?, var(x), or V(x); с, the positive 


square root of the variance, is called the standard deviation. 


Figure 4.1 shows how the variance reflects the spread or dispersion of the 
distribution of a random variable. Here we show the histograms of the probability 
distributions of four random variables with the same mean u = 5, but variances 
equalling 5.26, 3.18, 1.66, and 0.88. As can be seen, a small value of a? suggests 
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АТАБ ЕЗЕТ еВ 12130416 8 7 B 9 
и = Sand a? = 5.26 u= Бапа o? = 3.18 
70 


Wes $$ «5,6 778 Ww УАТ Зе 0058 0777 8 '9 


и = Бапа a? = 1.66 и = 5 and g? = 0.88 


Figure 4.1 Distributions with different dispersions. 


that we are likely to get a value close to the mean, and a large value of с? suggests 
that there is a greater probability of getting a value that is not close to the mean. 
This will be discussed further in Section 4.4. A brief discussion of how jz, the 
third moment about the mean, describes the symmetry or skewness (lack of 
symmetry) of a distribution is given in Exercise 10 on page 157. 

In many instances moments about the mean are obtained by first calculating 
moments about the origin and then expressing the д, in terms of the ш’. To serve 
this purpose, the reader will be asked to verify a general formula in Exercise 9 
on page 157. Here, let us merely derive the following computing formula for o: 


THEOREM 4.6 


Proof. 


а 
Ii 


+= El(x — uy] 
E(x? — 2uxt и?) 


1 
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1 


E(x?) - 2uE(x) + Е(и?) 
E(x?) – 2р: р + и? 


2 


и ш Y 
EXAMPLE 4.10 


Use Theorem 4.6 to calculate the variance of the random variable x, representing 
the number of points rolled with a balanced die. 


Solution 


First we compute 


p= E(x =1:1+2.1+3:1+4-1+5:1+6-4 
7 
2 


EXAMPLE 4.11 


Find the standard deviation of the random variable x of Example 4.2. 


Solution 
On page 137 we showed that p = E(x) = 0.4413. Now 


4 1 2 
"E dx 
0 


ajo ltx 

4[! 1 
eae 

T Jo Lex 

4 
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and it follows that 
а? = 0.2732 – (0.4413)? = 0.0785 


and с = V/0.0785 = 0.2802. A 


The following is another theorem that is of importance in work connected 
with standard deviations or variances: 


THEOREM 47 If x has the variance o”, then 


var(ax + b) = а?о? 


The proof of this theorem will be left to the reader, but let us point out the 
following corollaries: For a = 1 we find that the addition of a constant to the 
values of a random variable, resulting in a shift of all the values of x to the left or 
to the right, in no way affects the spread of its distribution; for b = 0 we find 
that if the values of a random variable are multiplied by a constant, the variance 
is multiplied by the square of that constant, resulting in a corresponding change 
in the spread of the distribution. 


44 CHEBYSHEV'S THEOREM 


To demonstrate how с ot c? is indicative of the spread or dispersion of the 
distribution of a random variable, let us now prove the following theorem, called 
Chebyshev's theorem after the nineteenth-century Russian mathematician P. L. 
Chebyshev. We shall prove it here only for the continuous case, leaving the 
discrete case as an exercise. 


THEOREM 48 (Chebyshev's Theorem) If u and c are, respectively, the 
mean and the standard deviation of the random variable x, then for any 


positive constant k the probability is at least 1 — a that x will take on a 


value within k standard deviations of the mean; symbolically, 


Р(х ~ u| < ko) 1-3 


a a 
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Proof. From Definitions 4.4 and 4.5 we write 
со 
о? = Е[(х - uy] = f (x — ш)? - f(x) dx 
and, dividing the integral into three parts as shown in Figure 4.2, we get 


tke 


(x — и)? * f(x) dx 
ке 


os Na А 
mullis ш) f(x) dx + 


ue 


B [| (x — и)? + f(x) dx 
u+tko 


u= ko [1 u*ka 


Figure 4.2 Diagram for proof of Chebyshev's theorem. 


Since the integrand (x — ш)? f(x) is non-negative, we can form the 
inequality 


ake © 
о? > ү к-н foo ax + | (x = ш)? + f(x) dx 
e u+ko 


by deleting the second integral. Now, since (х — u)? > Ко? for x < 
ш — ko or x > ш + ko, it follows that 


ay e 
а? > D Ка? · f(x) dx + f Ко? + f(x) dx 
-o0 nt keo 


and, hence, that 


© 


1 БОЙ. 
B? f six) ax + | 


со Bt 


f(x) dx 
ko 
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provided a? # 0. Since the sum of the two integrals in this inequali 
represents the probability that x will take on a value less than or equal 
ш — ke or greater than or equal to и + Ко, we have, thus, shown that 


Р(х - ш> ke) = 5 
and it follows that 


P(x - u| < ko) 2 1 -5 v 

For instance, the probability is at least į that x will take on a value міфі 

two standard deviations of the mean, the probability is at least $ that x will taki 
on a value within three standard deviations of the mean, and the probability'is: 
at least 35 that x will take on a value within five standard deviations of the mean. 
It is in this sense that c controls the spread or dispersion of the distribution 
a random variable. Clearly, the probability given by Chebyshev's theorem is only 
a lower bound; whether the probability that a given random variable will tak 
on a value within k standard deviations of the mean is actually greater than. 


1 ^ 
1- x and if so by how much, we cannot say, but Chebyshev's theorem assures 


us that this probability cannot be less than 1 — ra Only when the distribution 


of a random variable is known can we calculate the exact probability. 


If the probability density of the random variable x is given by 


471 ы QM 
bia) = t (1 — x) for0<x <1 
0 elsewhere 
find the probability that x will take on a value within two standard deviations 1 
of the mean and compare it with the lower bound provided by Chebyshev's | 
theorem. 


Solution 


Straightforward integration shows that д = } and о? = 4, so that с = 
& = 0.15 (approximately). Thus, the probability that x will take on a value 
within two standard deviations of the mean is the probability that it will 
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take on a value between 0.20 and 0.80, namely, 


D 
0.80 


Р(0.20 < x < 0.80) = | 630x*(1 — x)^ dx 


0.20 


= 0.96 


Observe that the statement "the probability is 0.96" is a much stronger 
statement than "the probability is at least 0.75," which is provided by 
Chebyshev's theorem. A 


45 MOMENT-GENERATING FUNCTIONS 


Although the moments of most distributions can be determined directly by 
evaluating the necessary integrals or sums, there exists an alternative procedure 
which sometimes provides considerable simplifications. This technique utilizes 
moment-generating functions. 


D 


EFINITION 4.6 The moment-generating function of the random variable x, 


р 
where it exists, is given by 


M,(t) = E(e") = Xe" · f(x) 


when x is discrete, and 


© 


M,(t) = E(e™) = | е*. f(x) dx 


-0 


when x is continuous. 


The independent variable is t, and we are usually interested in values of t in the 


neighborhood of 0. 
To explain why we refer to this function as a “moment-generating” function, 


let us substitute for e" its Maclaurin's series expansion, namely, 


ре жи С 
СТТ Sema! m 
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For the discrete case, we thus get 


2.2 ror 
Mg) = |+ 0+ SE + з € ess] 


x 21 
2 r 
= DA) + ESO eI Бї) Te ухх) + 
? r 
h lhea TIRE i TT A 


and it can be seen that in the Maclaurin's series of the moment-generating function 
+ T Ae 
of x the coefficient of яі ш, the rth moment about the origin of the random 


variable x. In the continuous case, the argument is the same. 


EXAMPLE 4.13 


Find the moment-generating function of the random variable x, whose probability 
density is given by 


25 forx > 0 
fav P elsewhere 


and use it to find an expression for ш’. 


Solution 


By definition 


МД!) = E(e*) = [4 EE 


0 


- n e *a-0 dx 


0 


1 
TEST fort «1 
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As is well known, when |t| < 1 the Maclaurin’s series for this moment- 
generating function is 


M(t) -1*tt-Ü-Ü ter te à Gv 


Ll 
i 
+ 
= 
| 
+ 
B 
| 
zt 
= 
| 
+ 
+ 
= 
| 
in 


and, hence, ш, = r! for r = 0, 1, уо А 


The main difficulty in using the Maclaurin's series of a moment-generating 
function to determine the moments of a random variable is usually not that of 
finding the moment-generating function, but that of expanding it into a 
Maclaurin's series. If we are interested only in the first few moments of a random 
variable, say, ш! and шз, their determination can usually be simplified by using 


the following theorem: 


d'M,(t) 
dat’ t=0 


= р. 


This follows from the fact that if a function is expanded as a power series in 1, 
r 


the coefficient of = is the rth derivative of the function with respect to t at t = 0. 


EXAMPLE 4.14 
EAE DR HI 1/3 
Given that x has the probability distribution f(x) = (2) for x = 0, 1, 2, and 


d the moment-generating function of this random variable and use it to 


3, fin 
determine ш! and шз. 


Solution 
Substituting in accordance with Definition 4.6, we get 


3 3 
Мш) = Ee") =: Y “() 
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Then, by Theorem 4.9, 
pi = М0) = #1 + е) ello = 3 
and 


шу = M40) -i1-e)e'-$310-e6)e|.0-3 А 


Often the work involved in using moment-generating functions can be 
simplified by making use of the following theorem: 


THEOREM 4.10 If a and b are constants, then 


1. Му+а(1) = E[e**??] = ей. M) 
2. Mpx(t) = E(e™') = M,(bt) 


Ио |] S e m,() 
AP. b 


The proof of this theorem will be left to the reader in Exercise 22 on page 159. 
As we shall see later, the first part of the theorem is of special importance when 
a = —p, and the third part is of special importance when a = —y and b = о, 
in which case 


THEORETICAL EXERCISES 


`1. Show that uo = 1 and ш, = 0 for any random variable for which E(x) exists. 


2. Find и, и}, and c? for a random variable which has the probability distribu- 
tion f(x) = 4 for x = —2 and x = 2. i 


3. Find д, и}, and с for a random variable which has the probability density 


for0<x<2 


f(x) = 


© NIX 


elsewhere 


10. 
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. Find ш’ and с? for a random variable x which has the probability density 


Wes 
f(x) md forl€ x «3 
x)- 


0 elsewhere 


Prove Theorem 4.7. 


. With reference to Exercise 8 on page 144, find the variance of g(x) = 2x + 3. 
. If the random variable x has the mean и and the variance a°, show that the 


random variable z, whose values are related to those of x by means of the 
х-и 


equation 2 = „has E(z) = 0 and var(z) = 1. A distribution which has 


the mean 0 and the variance 1 is said to be in standard form, and when we 


A we are said to be stan- 


perform the above change of variable, z — E 


dardizing the distribution of x. 


. If the probability density of the random variable x is given by 


оху forix». d 
f(x) = [ elsewhere 
check whether its mean and its variance exist. 
Show that 


СООР) 


for r = 1,2, 3,..., and use this formula to determine expressions for из and 
Ша. 

The symmetry ог skewness (lack of symmetry) of a distribution is often 
measured by means of the quantity 


Use the formula for и; obtained in Exercise 9 to find о; for each of the 
following distributions, which (as can easily be verified) have equal means 


158 


Chap. 4: Mathematical Expectation 


11. 


12. 


1 


14. 


15. 


16. 


17. 


e 


and standard deviations: 

(a) f(1) = 0.05, f(2) = 0.15, f(3) = 0.30, f(4) = 0.30, f(5) = 0.15, and 
f(6) = 0.05; 4 

(b) f(1) = 0.05, f(2) = 0.20, f(3) = 0.15, f(4) = 0.45, f(5) = 0.10, and 
f(6) = 0.05. 

Also draw histograms of the two distributions and note that whereas the first 

is symmetrical, the second has a “tail” on the left-hand side and is said to 

be negatively skewed. 


The extent to which a distribution is peaked or flat, also called the kurtosis 
of the distribution, is often measured by means of the quantity 


mu 
ye 
gt 


Use the formula for u, obtained in Exercise 9 to find a, for each of the 

following symmetrical distributions, of which the first is more peaked (narrow 

humped) than the second: 

(a) f(-3) = 0.06, f(-2) = 0.09, f(—1) = 0.10, f(0) = 0.50, f(1) = 0.10, 
f(2) = 0.09, and /(3) = 0.06; 

(b) /(—3) = 0.04, f(-2) = 0.11, f(-1) = 0.20, f(0) = 0.30, f(1) = 0.20, 
f(2) = 0.11, and /(3) = 0.04. 

Duplicating the steps used in the text, prove Chebyshev's theorem for a 

discrete random variable x. 


Show that if x is a random variable with non-negative values (that is, f(x) = 0 
for x < 0) and the mean и, then for any positive constant a, 


Р(х > а) = Ё 
а 


This inequality is called Markov's inequality, and we have given it here mainly 
because it leads to a relatively easy alternative proof of Chebyshev's theorem. 


Use the inequality of Exercise 13 to prove Chebyshev's theorem. [Hint: 
Substitute (x — р)? for x.] 


What is the least value of k in Chebyshev's theorem for which the probability 
that a random variable takes on a value between д — ko and ш + ke is 
(a) at least 0.95; 

(b) at least 0.99? 


If we let kg = c in Chebyshev's theorem, what does this theorem assert about 
the probability that a random variable will take on a value between woe 
and и + c? 


Find the moment-generating function of the discrete random variable x which 
has the probability distribution f(x) = 2(4)* for x = 1, 2, 3,..., and use it 
to determine the values of ш! and u$. 


18. 


19. 


20. 


21. 
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Find the moment-generating function of the continuous random variable x 
whose probability density is given by 


1 for0<x<1 
= p elsewhere 


and use it to find 41, д, and c. 

If we let R,(t) = In M,(t), show that R4(0) = и and R4(0) = o°. Also, use 
these results to find the mean and the variance of a random variable x having 
the moment-generating function 


M,(t) = @*ё'-\) 


t 
Explain why there can be no random variable for which M,(t) = ТИР, 
Show that if a random variable has the probability density f(x) = i e^! for 
оо < x < ©, its moment-generating function is given by 


1 
M,(t) = тп 


Also find the variance of the distribution of this random variable 

(a) by expanding the moment-generating function as an infinite series and 
reading off the necessary coefficients; 

(b) by using Theorem 4.9. 


. Prove all three parts of Theorem 4.10. 
23. 


Given the moment-generating function M,(t) = e*'*8° of the random vari- 
able x, find the moment generating function of the random variable z = 
1(x — 3), and use it to find the mean and the variance of z. 


APPLIED EXERCISES 


24, 


25. 


With reference to Example 4.1 on page 135, find the variance of the distribu- 
tion of the number of defective sets. 
The length of time for one individual to be served at a cafeteria is a random 
variable with the probability density 


ое" 
ie* forx > 0 
frs |! elsewhere 


Find the mean and the variance of this distribution. 
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26. 


27. 


28. 


31. 


. With reference to Exercise 17 on page 101, find the mean and the variance 
of the distribution of the random variable. 


With reference to Exercise 18 on page 88, find the mean and the variance of 
the weekly number of accidents at the intersection. 


The following are some applications of the Markov inequality of Exercise 13: 

(a) The scores which high school juniors get on the verbal part of the 
PSAT/NMSQT test may be looked upon as values of a random variable 
with the mean u = 41. Find an upper bound to the probability that 
one of the students will get a score of 65 or more. 

(b) The weight of certain animals may be looked upon as a random variable 
with a mean of 212 grams. If none of the animals weighs less than 165 
grams, find an upper bound to the probability that such an animal will 
weigh at least 250 grams. 


The number of marriage licenses issued in a certain city during the month 
of June may be looked upon as a random variable with и = 124 and с = 7.5. 
According to Chebyshev's theorem, with what probability can we assert that 
between 64 and 184 marriage licenses will be issued there during a month 
of June? 


. A study of the nutritional value of a certain kind of bread shows that the 
amount of thiamine (vitamin B,) in a slice may be looked upon as 4 random 
variable with ш = 0.260 milligram and с = 0.005 milligram. According to 
Chebyshev’s theorem, between what values must be the thiamine content of 
(a) at least 3$ of all slices of this bread; 

(b) at least 133 of all slices of this bread? 


With reference to Exercise 25, what can we assert about the length of time 
it takes a person to be served at the cafeteria, if we use Chebyshev's theorem 
with k — 1.5? What is the corresponding exact probability? 


46 PRODUCT MOMENTS 


To continue the discussion of Section 4.3, let us now present the various product 
moments of two random variables. 


DEFINITION 47 The rth and sth product moment about the origin of the 
random variables x and y, denoted by u/s, is the expected value of x'y^; 
symbolically, 


Hrs = E(xXy)- Irxy О у) 
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for = 0, 1, 2,..., and s = 0, 1, 2,..., when x and y are discrete, and 


Mrs = Е(х'у') = | [ х'у*. f(x, y) dx dy 


=% J-o 


when x and y are continuous. 


In the discrete case, the double summation extends over the entire joint range 
of the two random variables. Note that Hio = E(x), which will be denoted here 
by wx, and that шо; = E(y), which will be denoted here by Ay. 

Analogous to Definition 4.4, let us now make the following definition of 
product moments about the respective means: 


DEFINITION 4.8 The rth and sth product moment about the respective means 
of the random variables x and y, denoted by p, is the expected value of 
(x = ux)" (y = шу)"; symbolically, 


Hrs = El(x - дъ) (уг py] = EX (х Mx) = My)" f(x y) 


for r = 0,1,2,...,and s = 0, 1, 2,..., when x and y are discrete, and 
Mes = Е[(х = №) (у — uy)'] 


= [| | (x = px) (у = шу)" S(x, y) dx dy 


when x and y are continuous. 


In statistics, дл is of special importance because it is indicative of the 
relationship, if any, between the values of x and y; thus, it is given a special 
symbol and a special name. 


DEFINITION 49 и is called the covariance of x and y, and it is denoted 
by озу, cov(x, y), or C(x, у). 


Observe that if there is a high probability that large values of x will go with large 
values of y and small values of x with small values of y, the covariance will be 
positive; if there is a high probability that large values of x will go with small 


162 Chap. 4: Mathematical Expectation 


values of y and vice versa, the covariance will be negative. It is in this sense that 
the covariance measures the relationship, or association, between the values of 
x and y. 

Analogous to Theorem 4.6, let us now prove the following result, which is 
useful in actually determining the values of covariances: 


THEOREM 4.11 


| Proof. Using the various theorems about expected values, we can 
write 
9x, = E[(x — д„)(у — ж,)] 
= E(xy — xu, — ур, + ши) 
= E(xy) - py E(x) — nxE(y) + „и, 
= E(xy) > Hyls — Mabey + „ш, 
= Min Ару — Y 


EXAMPLE 4.15 


In Example 3.20 on page 118, the joint and marginal probabilities of the two 
random variables x and y, representing, respectively, the number of aspirin tablets 
and the number of sedative tablets among two tablets drawn from a bottle 
containing 3 aspirin, 2 sedative, and 4 laxative tablets, were recorded as follows: 


x 
0 1 2 
9| i i ub | a 
У 3 i а 
21-5 4 
5 i i 


Find the covariance of x and y. 


Sec. 4.6.: Product Moments 163 


Solution 


Referring to the table of joint probabilities, we get 


and 


If follows that 


The negative result suggests that the more aspirin tablets we get the fewer 
sedative tablets we will get, and vice versa, and this, of course, makes 
sense. A 


EXAMPLE 4.16 
Find the covariance of the two random variables whose joint density is given by 


=} forx>0,y>0,x+y<1 
f») = 0 elsewhere 


Solution 


Evaluating the necessary integrals, we get 


1 1-x 
Mx = [ | 2x dy dx = 
o Jo 


1 1-x 
m-f | 2y dy dx = 
o Jo 


pm 


pr 


164 Chap. 4: Mathematical Expectation 


and 


It follows that 


So far as the relationship between x and y is concerned, observe that if x 
and y are independent, their covariance is zero; symbolically, 


THEOREM 412 If x and у are independent, then E(xy) = E(x) · E(y) and 
gy, = 0. 


Proof. For the discrete case, we have by definition 
E(xy) = XX xy: f(x, y) 
х у 


Since x and y are independent, we can write f(x, у) = g(x)h( y), where 
g(x) and h( y) are the values of the respective marginal distributions of x 
and y, and we get 


E(xy) = УУ xy- gGOh(y) 
xy 


= [ex «coz» ny) 
x 3 
= E(x): Ey) 
Hence, 
Фу = Hin — Ау 
E(x) - E(y) — E(x) · E(y) 
Eton. 


It is of interest to note that the independence of two random variables 
implies a zero covariance, but a zero covariance does not necessarily imply their 
independence. This is illustrated by the following example. 
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EXAMPLE 4.17 


Given two discrete random variables x and y with the joint probability distribution 


show that the covariance is zero even though the two random variables are not 
independent. 


Solution 
Using the values in the table as well as the marginal totals, we get 
ux 7(71):10*0:191:1-20 
p, = (-1)°3+0°041-3=-} 
and 
mia = (=1)(—1).$ + 0(-1) “$4 10-1) $+ (-10)1 +15154 
= 0 


Thus, oy, = 0 — 0(-3) = 0, but the two random variables are not inde- 
pendent since f(x,y) # g(x): h(y), for example, for x = –1 and 
у=-1. A 


Product moments can also be defined for the case where there are more 
than two random variables. Here, let us merely state the important result that 


THEOREM 413 If Xi, Xo, ..., and x, are independent, then 


Е(хх;....* X) = E(x) + EGG) +... © Е(х„) 
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This is a generalization of the first part of Theorem 4.12; in fact, the proof of 
this theorem, based on Definition 3.14, is essentially like that of the first part of 


Theorem 4.12. 


47 MOMENTS OF LINEAR COMBINATIONS 
OF RANDOM VARIABLES 


In this section we shall derive expressions for the mean and the variance of a 
linear combination of n random variables and the covariance of two linear 
combinations of n random variables. Applications of these results will be treated 
later in our discussion of sampling theory and problems of statistical inference. 


THEOREM 4.14 If Xi, Xo, ..., X, are random variables and 
n 
у= У ах; 
i=l 
where а, @,..., 4, are constants, then 
n 
Ely) = X aE(x) 
i=1 
and 
2 
vary, V ai: уаг(х,) + 25 È aja; · cov(x;, xj) 
і=1 i<j 


where the double sum extends over all the values of i and j, from 1 to n, 
for which i « j. 


Proof. From Theorem 4.5 with gí(x,,x5,...,X4) = x, for i= 
0,1,2,...,n, it follows immediately that 


E(y) = e( S ax) = X a;E (x;) 


Sec. 4.7.: Moments of Linear Combinations of Random Variables 167 


and this proves the first part of the theorem. To obtain the expression for 
the variance of y, let us write ш; for E(x;), so that we get 


Il 


n n 2 
var(y) = Е([у - Е(у)]) = «(| У ах - У a) ) 
i= ici 


e| a-m] 


Then expanding by means of the multinomial theorem according to which 
(a + b + c + d)’, for example, equals a? + b? + с? + d? + 2ab + 2ас + 
2ad + 2bc + 2bd + 2cd, and again referring to Theorem 4.5, we get 


varty) = X. GELO = wi) + 2: EY a Ets = ш) = ш) 
i- i<j 
= ў а? · var(x;) +2: УУ аа · cov(x;, xj) 
52) im 


Note that we have tacitly made use of the fact that cov(x;, xj) = 
cov(x;, х;). M 


Since cov(x,, Xj) = 0 when x, and x, are independent, it follows immediately 
that 


COROLLARY If the random variables xı, X;,...,X, are independent and 


y= Y ax, then 
і=1 


var(y) = £ а? + var(x;) 
i=l 


EXAMPLE 4.18 
If the random variables x, y, and z have the means p, = 2, цу = —3, M: = 4, 
the variances 02 = 1, 02 = 5, a; = 2, and the covariances cov(x, y) = —2, 
cov(x, z) = -1, cov(y, 2) = 1, find the mean and the variance of 


м = 3x – у + 22 
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Solution 
By Theorem 4.14, 


E(w) = E(3x – y + 22) 

3E(x) - E(y) * 2E(z) 
-23:2—(-3)*2:4 
-17 


i 


and 


var(w) = 9 var(x) + var(y) + 4var(z) — 6 cov(x, y) 
+ 12cov(x, z) — 4 соу(у, z) 
=9:1+5+4:2-6(-2)+12(-1)- 4:1 
=18 А 


The following is another important theorem about linear combinations of 
random variables; it concerns the covariance of two linear combinations of n 
random variables: 


бе ла ee un 1. 


THEOREM 415 If Xj, X;,..., x, are random variables and 


у= Y ах; апа y: = У bx, 
[n 


imi 


where 4), @2,..., an, bi, b;,..., b, are constants, then 


cov(yi, уз) = 2 abi хаг(х) + Y Y (aib + a;b) · cov(x;, xy) 
= i<j 


AE ne ee | 


The proof of this theorem, which is very similar to that of Theorem 4.14, will be 
left to the reader in Exercise 10 on page 172. 
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; Since cov(x;, x;) = 0 when x; and x; are independent, it follows immediately 
that 


COROLLARY If the random variables x; , X2, ..., X, are independent, y; = 


Y ax, and y; = У bx, then 
ї=1 ї=1 


cov(yi, y) = i a,b, + var(x,) 


EXAMPLE 4.19 


If the random variables x, y, and z have the means и, = 3, My = 5, 4, = 2, the 
variances 02 = 8, 02 = 12, о? = 18, and the covariances cov(x,y) = 1, 
cov(x, 2) = —3, cov(y, z) = 2, find cov(u, v), where u = x + 4y + 22 and v = 


3x — y ie 
Solution 
By Theorem 4.15, 


cov(u, v) = cov(x + 4y + 22,3х - y — 2) 
= 3 var(x) — 4 var(y) — 2 var(z) + 11 cov(x, y) 
+ 5 cov(x, z) — 6 cov(y, 2) 
23.8-4-12-2-18* 11:1 5(-3) - 6:2 
=-76 A 


48 CONDITIONAL EXPECTATIONS 


In Section 3.7 we obtained conditional probabilities by adding the values of 
conditional probability distributions, or integrating the values of conditional 
density functions. Conditional expectations of random variables are likewise 
defined in terms of their conditional distributions. 
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DEFINITION 4.10 If x is a discrete random variable and f(x|y) is the value 
of the conditional probability distribution of x given y — y at x, the condi- 
tional expectation of u(x) given y — y is 


Е[и(х)|у] = X и(х). f(xly) 


Correspondingly, if x is a continuous random variable and f(x|y) is the 
value of the conditional probability density of x given y — y at x, the 
conditional expectation of u(x) given y — y is 


© 


Е[и(х)|у] = | u(x) + f(x|y) dx 


Similar expressions based on the conditional probability distribution or density 
of y given x = x define the conditional expectation of v(y) given x = x. 

If we let u(x) = x in Definition 4.10, we obtain the conditional mean of the 
random variable x given y — y, which we denote by 


њу = E(x|y) 


The conditional variance of x given у = y is 


2 
05у 


Е[(х — uay)ly] 
E(x'|y) - шз, 


where E (x'|y) is given by Definition 4.10 with u(x) = x^. The reader should not 
experience any difficulty in generalizing Definition 4.10 for conditional expecta- 
tions involving more than two random variables. 


EXAMPLE 4.20 
If the joint density function of two random variables x and y is given by 


yos pe ds for0<x<1,0<y<1 
и 0 elsewhere 


find the conditional mean and the conditional variance of x given y = }. 
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Solution 


In Example 3.24 on page 124 we showed that the conditional density of x 
given y = y is 


Cy for0<x<1 
fixy) = 4 1+ 4y 
0 elsewhere 


so that 


1 


foe mo» for0<x<1 
x5 


0 elsewhere 
Thus, 4444 is given by 


Eli) = | n 
0 


Next we find 


and it follows that 


THEORETICAL EXERCISES 
1. If x and y have the joint probability distribution S(x,y) = 4 for x = —3 and 
у= —5,х = —l and y = —1, x = l and у = 1, and x = 3 and y = 5, find 
соу(х, y). i 
2. With reference to Exercise 1 on page 113, find cov(x, y). 
3. With reference to Example 3.22 on page 121, find cov(xi, x;). 
4. With reference to Exercise 6 on page 129, find cov(x, y). 
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10. 
11. 


= 


12. 


13. 


If the joint distribution of x and y has the values /(—1,0) = 0, /(—1,1) = 1, 
f(0,0) = $, f(0, 1) = 0, f(1, 0) = i5, and f(1, 1) = 1, show that x and y are 
dependent, and find their covariance. 


. For К random variables x;,X;,...,X,, the values of their joint moment- 


generating function are given by 


E (e&t) 


(a) Show for either the discrete case or the continuous case that the partial 
derivative of the joint moment-generating function with respect to t, at 
=» =... = = 018 E(x). 

(b) Show for either the discrete сазе or the continuous case that the second 
partial derivative of the joint moment-generating function with respect 
to f, and h,i # j att 2 t = +++ = t = 0 is E(xix). 

(c) If two random variables have the joint density given by 


nem forx >0,y>0 
elsewhere 


fs y) = k 


find their joint moment-generating function and use it to determine the 
values of E(xy), E(x), E(y), and hence cov(x, y). 


. If the independent random variables X1, X2, and х; have the means 4, 9, 3, 


and the variances 3, 7, 5, find the mean and the variance of 
(а) у = 2x, – 3х, + 4x; 
(b) 2 = x, + 2х, – x. 


. Repeat both parts of Exercise 7, dropping the assumption of independence 


and adding the information that €ov(X;, X2) = 1, cov(X;, хз) = —2, and 
cov(x,, X3) = —3. 


If the joint density of x and y is given by 


f =[ +” Ѓог0 <х<1,0<у<2 
0 elsewhere 


find the variance of w = 3x + 4y — 5. 
Prove Theorem 4.15. 


Express var(x + y), var(x — y), and cov(x + y, x — y) in terms of the vari- 
ances and covariance of x and y. 

If x;, х2, and х; have the variances 5, 4, 7, and cov(x,, x2) = 3, cov(xi, X3) = 
-2, while x; and x, are independent, find the covariance of У; = 
X, — 2X; + 3x; and y; = -2x, + 3х, + 4X,. 

With reference to Exercise 7, find cov(y, z). 
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14. With reference to Exercise 1 on page 128, find the conditional mean and the 
conditional variance of x given y — —1. 


15. With reference to Exercise 3 on page 129, find the conditional expectation 
of the random variable u(z) = 2° given x = 1 and y = 2. 


16. With reference to Exercise 6 on page 129, find the conditional mean and the 
conditional variance of y given x = 4. 1 

17. With reference to Example 3.22 on page 121 and part (b) of Exercise 9 on 
page 130, find the expected value of xix, given x, =. 


18. (a) Show that the conditional distribution function of the continuous ran- 
dom variable x given a < x < b is given by 


0 forx =a 
ЕО) E 

F(xla <x < b) = аза fora<x<b 
1 for x > b 


(b) Differentiate the result of part (a) with respect to x to find the conditional 
probability density of x given a < x < b, and then show that 


Г ибх) х) dx 


E[u(x)a < x < b] = EX rm 


APPLIED EXERCISES 


19. A penny, which is unbalanced so that the probability of heads is 0.40, is 
tossed twice. What is the covariance of z, the number of heads obtained on 
the first toss, and w, the total number of heads obtained in the two tosses 
of the coin? 

20. The inside diameter of a cylindrical tube is a random variable with a mean 
of 3 inches and a standard deviation of 0.02 inch, the thickness of the tube 
is a random variable with a mean of 0.3 inch and a standard deviation of 
0.005 inch, and the two random variables are independent. Find the mean 
and the standard deviation of the outside diameter of the tube. 


21. The length of certain bricks is a random variable with a mean of 8 inches 
and a standard deviation of 0.1 inch, and the thickness of the mortar between 
two bricks is a random variable with a mean of 0.5 inch and a standard 
deviation of 0.03 inch. What is the mean and the standard deviation of the 
length of a wall made of 50 of these bricks laid side by side, if we can assume 
that all the random variables involved are independent? 


22. If heads is a success when we flip a coin, getting a six is a success when we 
roll a die, and getting an ace is a success when we draw a card from an 
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23. 


24. 


25. 


ordinary deck of 52 playing cards, find the mean and the standard deviation 

of the total number of successes when we 

(a) flip a balanced coin, roll a balanced die, and then draw a card from a 
well-shuffled deck; 

(b) flip a balanced coin three times, roll a balanced die twice, and then 
draw a card from a well-shuffled deck. 


If we alternately flip a balanced coin and a coin which is loaded so that the 
probability of getting heads is 0.45, what are the mean and the variance of 
the number of heads which we obtain in ten flips of these coins? 


With reference to Exercise 22 on page 116 and part (c) of Exercise 16 on 
page 131, find the expected number of statistics texts given that one mathe- 
matics text is selected. 


With reference to Exercise 19 on page 132, by how much can a salesperson 
who spends $12 on gasoline expect to be reimbursed? 


. The length of time (in minutes) that a person talks on a telephone is a random 


variable having the probability density 


(ог0<х=2 


f(x) = 


їогх>2 


© xja eix 


elsewhere 


With reference to part (b) of Exercise 18, find the expected length of a 
telephone conversation that has lasted at least one minute. 


REFERENCES 


L1 : В : А А 
Further information about the material in this chapter may be found in the mathematical 
statistics texts listed at the end of Chapter 3. 


5.1 


Special Probability 
Distributions 


INTRODUCTION 


In this chapter we shall study some of the probability distributions which figure 
most prominently in statistical theory and in applications. We shall also study 
their parameters, that is, the quantities which are constants for particular distribu- 
tions, but which can take on different values for different members of families 
of distributions of the same kind. The most common parameters are the lower 
moments, mainly ш and а?, and as we saw in the preceding chapter, there are 
essentially two ways in which they can be obtained: We can evaluate the necessary 
sums directly or we can work with moment-generating functions. Although it 
would seem logical to use in each case whichever method is simplest, we shall 
sometimes use both. In some instances this will be done because the results are 
needed later; in others it wilf merely serve to provide the reader with experience 
in the application of the respective mathematical techniques. Also, to keep the 
size of this chapter within bounds, many of the details will be left as exercises. 


5.2 THE DISCRETE UNIFORM 


DISTRIBUTION 


If a random variable can take on k different values with equal probabilities, we 
say that it has a discrete uniform distribution; symbolically, 
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[— 


DEFINITION 51 А random variable x has a discrete uniform distribution, 
and it is referred to as a discrete uniform random variable, if and only if 
its probability distribution is given by 


f(x) = 


for x = x), X2,..45 Xi 


le 


where x, # x; when i = j. 


In accordance with Definitions 4.2 and 4.4, the mean and the variance of this 
k k 

distribution are и = Y x,- t ando? = Y (x - uy * г 

i=l v 


i=l 
In the special case where x, = i, the discrete uniform distribution becomes 


= 1 
f(x) = k for x = 1, 2,...,k, and in this form it applies, for example, to the 


number of points we roll with a balanced die, The mean and the variance of this 
discrete uniform distribution, and its moment-generating function, are treated in 
Exercises 1 and 2 on page 183. 


53 THE BERNOULLI DISTRIBUTION 


If an experiment has two possible outcomes, "success" and “‘failure,”’ and their 


probabilities are, respectively, Ө and 1 — Ө, then the number of successes, 0 or 
1, has a Bernoulli distribution; symbolically, 


DEFINITION 5.2 A random variable x has a Bernoulli distribution, and it is 


referred to as a Bernoulli random variable, if and only if its probability 
distribution is given by 


f(x; Ө) = 0'(1—0)7* гх = 0,1 


Thus, f(0; 0) = 1 — 6 and f(1; Ө) = Ө are combined into a single formula. 
Observe that we used the notation f(x; 6) to indicate explicitly that the Bernoulli 
distribution has the one parameter 6. Since the Bernoulli distribution is a special 
case of the distribution of Section 5.4, we shall not discuss it here in any detail. 
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In connection with the Bernoulli distribution, a success may be getting 
heads with a balanced coin, it may be catching pneumonia, it may be passing 
(or failing) an examination, and it may be losing a race. This inconsistency is a 
carryover from the days when probability theory was applied only to games of 
chance (and one player's failure was the other's success). Also for this reason, 
we referto an experiment to which the Bernoulli distribution applies as a Bernoulli 
trial, or simply a trial, and to sequences of such experiments as repeated trials. 


54 THE BINOMIAL DISTRIBUTION 


Repeated trials play a very important role in probability and statistics, especially 
when the number of trials is fixed, the parameter 6 (the probability of a success) 
is the same for each trial, and the trials are all independent. As we shall see, 
there are several random variables that arise in connection with repeated trials. 
The one we shall study here concerns the total number of successes; others will 
be given in Section 5.5. 

The theory which we shall discuss in this section has many applications; 
for instance, it applies if we want to know the probability of getting 5 heads in 
12 flips of a coin, the probability that 7 of 10 persons will recover from a tropical 
disease, or the probability that 35 of 80 persons will respond to a mail-order 
solicitation. However, this is the case only if each of the 10 persons has the same 
chance of recovering from the disease and their recoveries are independent (say, 
they are treated by different doctors in different hospitals), and if the probability 
of getting a reply to the mail-order solicitation is the same for each of the 80 
persons and there is independence (say, no two of them belong to the same 
household). 

To derive a formula for the probability of getting “x successes in n trials” 
under the stated conditions, observe that the probability of getting x successes 
and n — x failures in a specific order is 6*(1 — 0)" ^^. There is one factor Ө for 
each success, one factor 1 — 0 for each failure, and the x factors 0 and n — x 
factors 1 — 6 are all multiplied together by virtue of the assumption of indepen- 
dence. Since this probability applies to апу sequence of n trials in which there 
are x successes and n — x failures, we have only to count how many sequences 
of this kind there are, and then multiply 6*(1 — 8)" * by that number. Clearly, 
the number of ways in which we can select the x trials on which there is to be 


a success is ( ). and it follows that the desired probability for “х successes in 
x 


n trials" is (") (ist f et) is 
x 
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DEFINITION 5.3 А random variable x has a binomial distribution, and it is 
referred to as a binomial random variable, if and only if its probability 
distribution is given by 


b(x; п,0) = (“or —0)-  forx-0,12,...,n 


Thus, the number of successes in n trials is a random variable having a binomial 
distribution with the parameters n and 6. The name "binomial distribution" 
derives from the fact that the values of Б(х; п, 0) for x = 0,1,2,...,n are 
the successive terms of the binomial expansion of [(1 — 0) + @]"; this shows 
also that the sum of the probabilities equals 1, as it should. 


EXAMPLE 5.1 


Find the probability of getting 5 heads and 7 tails in 12 flips of a balanced coin. 


Solution 


Substituting x = 5, n = 12, and @ = } into the formula for the binomial 
distribution, we get 


12 


b(5; 12,3) = p 


Jou куз 
A 12 ; К 
and, looking up the value of 5 in Table VII, we find that the result is 
792(3)" or approximately 0.19. А 
EXAMPLE 5.2 


Find the probability that 7 of 10 persons will recover from a tropical disease, 
given that the probability is 0.80 that any one of them will recover from the disease. 


Solution 


Substituting X = 7, n = 10, and 6 = 0.80 into the formula for the binomial 
distribution, we get 


b(7; 10, 0.80) = t (0.80) (1 — 0.80) 0-7 
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5 10 
and, looking up the value of ( E) in Table VII, we find that the result is 
120(0.80)'(0.20)° or approximately 0.20. А 


If we tried to calculate the third probability asked for on page 177, the one 
concerning the responses to the mail-order solicitation, by substituting x — 35, 
n = 80, and, say, @ = 0.15, into the formula for the binomial distribution, we 
would find that this requires a prohibitive amount of work. In actual practice, 
binomial probabilities are rarely calculated directly, for they are tabulated exten- 
sively for various values of Ө and n—in the National Bureau of Standards table 
for n = 2 to n = 49 and in the book by Н. С. Romig for n = 50 to n = 100.' 
At the end of this book, Table I gives the values of b(x; n, 0), to four decimals, 
for n = 1 to n = 20 and 6 = 0.05, 0.10, 0.15,..., 0.45, and 0.50. To use this 
table when 6 is greater than 0.50, we employ the identity 


THEOREM 5.1 


b(x; п, @) = b(n — х; п,1 – 0) 


which the reader will be asked to prove in Exercise 4 on page 183. For instance, 
to find b(11; 18, 0.70), we look up (7; 18, 0.30), getting 0.1376. 

There are also several ways in which binomial probabilities can be approxi- 
mated when n is large. One of these will be mentioned in Section 5.7, and another 


in Section 6.6. : j 
Let us now find formulas for the mean and the variance of the binomial 


distribution. 


THEOREM 52 The mean and the variance of the binomial distribution are 


и — n8 and о? = n6(l- 6) 


Proof. To determine the mean, let us directly evaluate the sum 


ia Ese (" oa - er 


x n! х n-x 
NOI CERES "TE 9 


* These books are listed among the references on page 207. Also, there exists extensive 
computer software for obtaining printouts of binomial probabilities. 
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where we omitted the term corresponding to x — 0, which is 0, and canceled 


А 7 п 
the x against the first factor of x! = x(x — 1)! іп the denominator оѓ (3). 
Then, factoring out the factor n in n! = n(n — 1)! and one factor 6, we get 


Pete S м 4 ea 2$ 


xa3iM — 1 


and, letting у = x — 1 and т = n — 1, this becomes 
АА S: (")ea — 0)" = пө 
y-0 \У 


since the last summation is the sum of all the values of a binomial distribution 
with the parameters m and 6, and hence equal to 1. 

To find expressions for и» and then c?, let us make use of the fact 
that Е(х?) = E[x(x — 1)] + E(x), and evaluate E[x(x — 1)], duplicating 
for all practical purposes the steps used above. We thus get 


E[x(x - 1)] = A x(x — »(?) 6*(1 – 6)" * 


n n! 


ЖКО хол у bd M 


N? e 2. n—2)\ a _ gyr- 
= n(n – 1)6 E aja (1 — 6) 


and, letting y = x — 2 and m = n — 2, this becomes 


E[x(x — 1)] 


у= 


n(n - D8- Y (7) ea - ey 
o\y 


n(n — 1)0° 


Therefore, 


A, = E[x(x - 1)] + E(x) = n(n – 1)8? + пө 
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and, finally, 


o= ui - p? 
= n(n — 1)0° + n0 — n?8? 
= n@(1 — 0) M 


It should not have come as a surprise that the mean of the binomial 
distribution is given by the product n6. After all, if a balanced coin is flipped 
200 times, we expect (in the sense of a mathematical expectation) 200 - += 100 
heads and 100 tails; similarly, if a balanced die is rolled 240 times we expect 
240 - 1 — 40 sixes, and if the probability is 0.80 that a person shopping at a 
department store will make a purchase, we would expect 400(0.80) — 320 of 400 
persons shopping at the department store to make a purchase. 

The formula for the variance of the binomial distribution, being a measure 
of variation, has many important applications, but to emphasize its significance 


d 1 x А y | 
let us consider the random variable у = —, where x is a random variable having 
n 


a binomial distribution with the parameters n and 6. This random variable is the 
proportion of successes in n trials, and in Exercise 6 on page 183 the reader will 
be asked to prove the following result: 


THEOREM 53 If x has a binomial distribution with the parameters n and 
x 
0 and y = 3 then 


6(1— 0) 


Е(у) = 6 and оў = = 


Now, if we apply Chebyshev's theorem with ko = c (see Exercise 16 on page 
158), we can assert that for any positive constant c, the probability is at least 


ЖАРП га) 


nc? 


that the proportion of successes in n trials falls between 0 — c and 0 + c. Hence, 
when n > ©, the probability approaches 1 that the proportion of successes will differ 


from Ө by less than any arbitrary constant c. This result is called a law of large 


numbers, and it should be observed that it applies to the proportion of successes, 
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not to their actual number. It is a fallacy to suppose that when п is large, the 
number of successes must necessarily be close to n6. 

Since the moment-generating function of the binomial distribution is easy 
to obtain, let us find it and use it to verify the results of Theorem 5.2. 


THEOREM 54 The moment-generating function of the binomial distribu- 
tion is given by 


M,(t) = [1 + 6(e' - 1)]” 


Proof. By Definitions 4.6 and 5.3, we get 


v pxt( "| px — gy 
M(t) = Ye (“Ver 8) 


ЕГ (Pee = 6)" 


x-0 


and by Theorem 1.9 on page 13 this summation is easily recognizable as 
the binomial expansion of [6e' + (1 — 8)]" = [1 + &(e' - 1)]". M 


If we differentiate M,(t) twice with respect to t, we obtain 
M,(t) = n6e'[1 + 6(e' — 1)^! 


Mg(t) = n6e'[1 + 6(e' – 1)! + n(n - )8e"[1 + &(e' - 1)" 
n8e'(1 — 6 + пёе')[1 + &(e' — 1)? 


and, upon substituting t = 0, we get ш! = пб and ш, = n6(1 — 0 + n6). Thus, 
и 7 n6 and а? = ui- u? = n6(1 — 6 + nd) – (n0)? = n&(1— Ө), which 
agrees with the formulas given in Theorem 5.2. 

From the work of this section it may seem easier to find the moments of 
the binomial distribution with the moment-generating function than to evaluate 
them directly, but it should be apparent that the differentiation becomes fairly 
involved if we want to determine, say, шз Or u4. Actually, there exists yet an 
easier way of determining the moments of the binomial distribution; it is based 
on its factorial moment-generating function, which is explained in Exercise 11 on 
page 184. 
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THEORETICAL EXERCISES 


1, 2,..., k, show 


1. If x has the discrete uniform distribution f(x) = a for x 


k 
that 
(a) its mean is и = ELA 
(b) its variance is с? = E 
(Hint: Refer to Appendix II.) 
2. If x has the discrete uniform distribution f(x) = 5 for х= 1, 2.00, k show 


that its moment-generating function is given by 


ў el- е") 
Мг е) 


Also find the mean of this distribution by evaluating lim M(t), and compare 


the result with that obtained in part (a) of Exercise 1. 


3. The Bernoulli distribution was not studied in any detail in Section 5.3 because 
it can be looked upon as a binomial distribution with n — 1. Show that for 
the Bernoulli distribution ш! = 0 for r = 1, 2, 3,..., 


1 
(a) by evaluating the sum Y x”: f(x; 8); 
x=0 
(b) by letting n = 1 in the moment-generating function of the binomial 
distribution and examining its Maclaurin’s series. 


Also show that 
(c typ ee 
oS Mae 


cise 10 on page 157; 


, where аз is the measure of skewness defined in Exer- 


(d о; = 1-00, where a, is the measure of peakedness defined in 


Exercise 11 on page 158. 
4, Prove Theorem 5.1. 


Б, If B(x; п, 0) = Ў b(k; n,8) for x = 0, 1, 2,..., n, show that 
k=0 


(a) b(x;n 6) = B(x; n, 0) - B(x = 1; n, 8); 
(b) b(x; n8) = Bn — x;n,1 0) - Bn-x-lin1- 85 
(с) B(x; п. 6) = 1 = В(п- x- li nl- 8). 


6. Prove Theorem 5.3. 
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10. 


11. 


12. 


. Use the recursion formula of Exercise 7 to show that for 0 = 3 the binomia 


usually be simplified by first calculating b(0; n, 6) and then using the recursion 
formula 


(n — x) 


b(x*1n0)- (x4 D - 6) D - 6) 


+ b(x; п, 0) 


tion with n = 7 and 0 = 025. 


en 3 n A 7 
distribution has a maximum at х = 2 when n is even, and maxima at. 


+ 
1 and x =~ ! when n is odd. 


b(x; n, 0) a maximum? 


In the proof of Theorem 5.2 we determined the quantity E[x(x — 1)], called 
the second factorial moment; in general, the rth factorial moment of x is 
given by 


Min = Е[х(х - 1)(x – 2)... (x2 г + 1)] 


Express ш>, 44, and д in terms of factorial moments. 


The factorial moment-generating function of a discrete random variable X is 
given by 


F(t) = E(t") = YU f(x) 


(a) Show that the rth derivative of F,(t) with respect to f at t = 1 is min 

* the rth factorial moment defined in Exercise 10. : 
(b) Show that for the Bernoulli distribution the factorial moment- -generating 

function is given by F,(t) = 1 — 6 6t, and hence that и, = 6 and | 

Шо = 0forr 1. 

(c) Find the factorial moment-generating function of the binomial distribu- | 

tion and use it to find и and c, 


If we let a = —y in the first part of Theorem 4.10, we get 
My(t) = M.-,(t) = е“. M,(t) 


(a) Show that the rth derivative of M,_,,(t) with respect to t at t = 0 gives | 
the rth moment about the mean of x. 
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(b) Find such a generating function for moments about the mean of the 
binomial distribution, and verify that the second derivative at t — 0 is 
n6(1 — 6). 

(c) Use the result of part (b) to show that for the binomial distribution 


15:20 
9 7 ТЕТТЕ = ө 


where a; is the measure of skewness defined in Exercise 10 on page 


157. What can we conclude about the skewness of the binomial distribu- 


tion when (i) 0 = 3, and (ii) n is large? 


APPLIED EXERCISES 


13. 


14. 


15. 


16. 


17. 


A multiple-choice test consists of eight questions and three answers to each 
question (of which only one is correct). If a student answers each question 
by rolling a balanced die and checking the first answer if he gets a 1 or 2, 
the second answer if he gets a 3 or 4, and the third answer if he gets a 5 or 
6, what is the probability that he will get exactly four correct answers? 


An automobile safety engineer claims that 1 in 10 automobile accidents is 
due to driver fatigue. Using the formula for the binomial distribution and 
rounding to four decimals, what is the probability that at least 3 of 5 
automobile accidents are due to driver fatigue? 


If 40 percent of the mice used in an experiment will become very aggressive 
within one minute after having been administered an experimental drug, find 
the probability that exactly six of fifteen mice which have been administered 
the drug will become very aggressive within one minute, using 

(a) the formula for the binomial distribution; 

(b) Table I. 

In a certain city, incompatibility is given as the legal reason in 70 percent of 
all divorce cases. Find the probability that five of the next six divorce cases 
filed in this city will claim incompatibility as the reason, using 

(a) the formula for the binomial distribution; 

(b) Table I. 

A social scientist claims that only 50 percent of all high school seniors capable 
of doing college work actually go to college. Assuming that this claim is true, 
use Table I to find the probabilities that among 18 high school seniors capable 
of doing college work 

(a) exactly 10 will go to college; 

(b) atleast 10 will go to college; 

(c) at most eight will go to college. 
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18. 


19. 


А quality control engineer wants to check whether (in accordance with 
specifications) 95 percent of the electronic components shipped by her com- 
pany are without flaws. To this end, she randomly selects 20 from each large 
lot ready to be shipped and passes the lot if they are all without flaws; 
otherwise each component in the lot is checked. Assuming that the lots are 
so large that we can use the binomial distribution as an approximation, use 
Table I to find the probabilities that she will commit the error of 

(a) holding a lot for a complete check even though 95 percent of the 

components are without flaws; 
(b) passinga lot even though only 90 percent of the components are without 


flaws; 

(c) passinga lot even though only 80 percent of the components are without 
flaws; 

(d) passing a lot even though only 70 percent of the components are without 
flaws. 


In planning the operation of a new school, one school board member claims 
that 4 out of 5 newly hired teachers will stay with the school for more than 
a year, while another school board member claims that it would be correct 
to say 3 out of 5. In the past, the two board members have been about equally 
reliable in their predictions, so that in the absence of any other information 
we would assign their judgments equal weight. If one or the other has to be 
right, what probabilities would we assign to their claims if it were found that 
11 of 12 newly hired teachers stayed with the school for more than a year? 


. Use Chebyshev's theorem and Theorem 5.3 to verify that the probability is 


at least 35 that 
(a) in 900 flips of a balanced coin the proportion of heads will be between 
0.40 and 0.60; 


(b) in 10,000 flips of a balanced coin the proportion of heads will be between 
0.47 and 0.53; 


(c) in 1,000,000 flips of a balanced coin the proportion of heads will be 
between 0.497 and 0.503. 


Note that this serves to illustrate the law of large numbers. 


ББ THE NEGATIVE BINOMIAL 


AND GEOMETRIC DISTRIBUTIONS 


In connection with repeated Bernoulli trials, we are sometimes interested in the 
number of the trial on which the kth success occurs. For instance, we may be 
interested in the probability that the tenth child exposed to a contagious disease 
will be the third to catch it, the probability that the fifth person to hear a rumor 
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will be the first one to believe it, or the probability that a burglar will be caught 
for the second time on his eighth job. 

If the kth success is to occur on the xth trial, there must be k — 1 successes 
on the first x — 1 trials, and the probability for this is 


b(k — цх- 1,0) = (; 3 ) g*(1- 8)** 


The probability of a success on the kth trial is 6, and the probability that the kth 
success occurs on the xth trial is, therefore, 


ө. b - iix 1 = (17i) eG = 6)" 


DEFINITION 54 A random variable x has a negative binomial distribution, 
and it is referred to as a negative binomial random variable, if and only if 
its probability distribution is given by 


(sk = (aci) ea - o7 


foxskkctlkt2.-.. 


Thus, the number of the trial on which the kth success occurs is a random variable 
having a negative binomial distribution with the parameters k and 6. The name 
“negative binomial distribution” derives from the fact that the values of b*(x; k, 0) 
for x = k, k +1, k + 2,..., are the successive terms of the binomial expansion 


A -k 
о (; - LI t In the literature of statistics, negative binomial distributions 


are also referred to as binomial waiting-time distributions or as Pascal distributions. 


EXAMPLE 5.3 
If the probability is 0.40 that a child exposed to a certain contagious disease will 
catch it, what is the probability that the tenth child exposed to the disease will 
be the third to catch it. 


—— 


* Binomial expansions with negative exponents are explained in the book by W. 
Feller listed among the references on page 72. 
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Solution 


Substituting x = 10, k = 3, and 0 = 0.40 into the formula for the negative 
binomial distribution, we get 


b*(10; 3, 0.40) = () (0.40)°(0.60)7 | 


= 0.0645 ^ 


When a table of binomial probabilities is available, the determination of 
negative binomial probabilities can generally be simplified by making use of the 
identity 


THEOREM 5.5 


b*(x; k, 0) = : + b(k; х, 0) 


which the reader will be asked to verify in Exercise 2 on page 198. 
EXAMPLE 5.4 

Use Theorem 5.5 and Table I to rework Example 5.3. 

Solution 


Substituting x — 10, k — 3, and 6 
we get 


0.40 into the formula of Theorem 5.5, 


b*(10; 3, 0.40) 


Ш 


io * b(3; 10, 0.40) 
70 (0.2150) 
0.0645 А 


| Moments of the negative binomial distribution may be obtained by proceed- 
ing as in the proof of Theorem 5.2. For the mean and the variance we get 
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THEOREM 5.6 The mean and the variance of the negative binomial distribu- 
tion are 


as the reader will be asked to verify in Exercise 3 on page 198. 
Since the negative binomial distribution with k — 1 has many important 
applications, it is given a special name; it is called the geometric distribution. 


DEFINITION 5.5 А random variable x has a geometric distribution, and it is 
referred to as a geometric random variable, if and only if its probability 
distribution is given by 


g(x; 0) = (1 - 6)" for x 9.1,2,3... 


EXAMPLE 5.5 


If the probability is 0.75 that an applicant for a driver’s licence will pass the road 
test on any given try, what is the probability that an applicant will finally pass 


the test on the fourth try. 


Solution 
Substituting x = 4and 0 = 0.75 into the formula for the geometric distribu- 
tion, we get 


g(4; 0.75) = 0.75(1 — 0.75)*"! 
= 0.75(0.25)° 
= 0.0117 


Of course, this result is based on the assumption that the trials are all 
independent, and there may be some question here about its validity. A 
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5.6 THE HYPERGEOMETRIC DISTRIBUTION 


In Chapter 2 we used sampling with and without replacement to illustrate the — 
multiplication rules for independent and dependent events. To obtain a formula 
analogous to that of the binomial distribution which applies to sampling without 
replacement, in which case the trials are not independent, let us consider a set 
of М elements of which К are looked upon as successes and the other N — К as 
failures. As in connection with the binomial distribution, we are interested in the 
probability of getting x successes in n trials, but now we are choosing, without 
replacement, n of the N elements contained in the set, 


k N-K 
There are (5 ways of choosing x of the k successes and E E ways 


N- 


of choosing n — x of the N – k failures, and, hence, [йа " 


k 
a ways of 


А к у М 
choosing x successes and n — x failures. Since there are ( ways of choosing 
n 


n of the N elements in the set and we shall assume that they are all equally j 
likely (which is what we mean when we say that the selection is random), it 
follows from Theorem 2.2 on page 39 that the probability of “х successes in n 


pus) 


DEFINITION 5.6 A random variable x has a hypergeometric distribution, and 
it is referred to as a hypergeometric random variable, if and only if its 
probability distribution is given by 


ОС 
са, гог 0 И 


п х= Капап- х= № – К 


Thus, for sampling without replacement, the number of successes in n trials is 


a random variable having a hypergeometric distribution with the parameters п, 
N, and k. 


EXAMPLE 5.6 


As part of an air pollution survey, an inspector decides to examine the exhaust 
of 6 of company’s 24 trucks. If 4 of the company’s trucks emit excessive amounts 
of pollutants, what is the probability that none of them will be included in the 
inspector’s sample? 
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Solution 


Substituting x = 0, n = 6, М = 24, and К = 4 into the formula for the 
hypergeometric distribution, we get 
(QC) 
0/06 


б) 


= 0.2880 ^ 


h(0; 6,24,4) = 


The method by which we find the mean and the variance of the 
hypergeometric distribution is very similar to that employed in the proof of 
Theorem 5.2. 


THEOREM 57 The mean and the variance of the hypergeometric distribu- 
tion are 


Lak Qa gt = PEN = UN = n) 
ER UNS AT УУ ММ = 1) 


Proof. То determine the mean, let us directly evaluate the sum 


Оа) 
irt ine MEN 


n 


П 
ims 
4 
tad 
Ce 
з 2 
ol 


where we omitted the term corresponding to x = 0, whichis 0, and cancelled 


: s k 
the x against the first factor of x! = x(x — 1)! in the denominator of (m 


N 
Then, factoring out */( Д | we get 
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and, letting y = x — 1 and m = n — 1, this becomes 
k m fk-1 ГМ 
Г) 53 y Mey 

n 


Finally, using Theorem 1.12 on page 16, we get 


(ку Ce Gs 


To obtain the formula for o^, we proceed as on page 180, namely, by 
first evaluating E[x(x — 1)] and then making use of the fact that E(x’) = 
E[x(x — 1)] + E(x) and, hence, с? = E[x(x — 1)] + E(x) — j^. Leaving 
it to the reader in Exercise 10 on page 199 to show that 


k(k — 1)n(n = 1) 


E[x(x —1)] = ММ) 


we thus obtain 


gio MET Dn(n- 1), nk FE 
ММ = 1) N VN 
nk(N — k)(N — n) 
NN — 1) 


Since the moment-generating function of the hypergeometric distribution 
is fairly complicated, it will not be treated in this book. It may be found, however, 
in the book by M. G. Kendall and A. Stuart listed among the references on 
page 133. i 

When N is large and n is relatively small compared to N (the usual rule 
of thumb is that n should not exceed 5 percent of N), there is not much difference 
between sampling with replacement and sampling without replacement, and the 


formula for the binomial distribution with the parameters n and 0 = x may be 


used to approximate hypergeometric probabilities. 


EXAMPLE 5.7 


Among the 120 applicants for a job only 80 are actually qualified. If 5 of these 
applicants are randomly selected for an “in-depth” interview, find the probability 
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that only 2 of the 5 will be qualified for the job by using (a) the hypergeometric 
distribution, and (b) the binomial distribution as an approximation. 


Solution 


(a) Substituting x = 2, n = 5, N = 120, and k = 80 into the formula for 
the hypergeometric distribution, we get 


h(2;5, 120, 80) = 177 
(20) 


rounded to three decimals; 
(b) substituting x = 2; n = 5, and 0 = ү = 3 into the formula for the 
binomial distribution, we get 


OS . 


rounded to three decimals. As can be seen from these results, the 
approximation is very close. A 


57 THE POISSON DISTRIBUTION 


When л is large, the calculation of binomial probabilities with the use of the 

formula of Definition 5.3 will usually involve a prohibitive amount of work. For 

instance, to calculate the probability that 18 of 3,000 persons watching a parade 

on a very hot summer day will suffer from heat exhaustion, we first have to 

determine (ж and if the probability is 0.005 that апу опе of the 3,000 
18 /' ^ 

persons watching the parade will suffer from heat exhaustion, we also have to 


982 
Iculate the value of (0.005)'8(0.995)” с Ту 4 
Ж. inthe ee we shall present a probability distribution which can be used 
ilities of this kind. Specifically, we shall investi- 


i i ial probab 
to approximate binomial p: We UL A whiten) 


gate the limiting form-of the binomial distribution when 


remains constant. Letting this constant be A, that is, n8 = A and, hence, 0 = m 
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we can write 


b(x; n, 8) = WOE З jr" 


н „басыз... чит л) 


xt n 


n(n —1)(n – 2) : ...* (n — x * 1) and write 


(-37 « [37 TO) 


we obtain 


( a et a ee mr х) 


Finally, if we let n > oo while x and A remain fixed, we find that 


(36362) 


А —n/A 
(: -*) эе 
п 


and, hence, that the limiting distribution becomes 


Ate 
x! 


р(х; A) = forx 07122700 


DEFINITION 57 A random variable x has a Poisson distribution, and it is | 
referred to as а Poisson random variable, if and only if its probability 
distribution is given by ; 


* 7X 


À 
p(x;A) ——— — ^ forx —0,1,2,... 
x! 


=: 
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Thus, in the limit when n > co, 0 > 0, and n@ = A remains constant, the number 
of successes is a random variable having a Poisson distribution with the parameter 
A. This distribution is named after the French mathematician Simeon Poisson 
(1781-1840). In general, the Poisson distribution will provide a good approxima- 
tion to binomial probabilities when n > 20 and 6 < 0.05. When n > 100 and 
n@ < 10, the approximation will generally be excellent. 


EXAMPLE 5.8 


If the probability is 0.005 that any one person attending a parade on a very hot 
day will suffer from heat exhaustion, what is the probability that 18 of the 3,000 
persons attending the parade will suffer from heat exhaustion? 


Solution 
Substituting x = 18 and A = 3,000(0.005) = 15 into the formula for the 
Poisson distribution, we get 


1518. e`" 
p(18; 15) = "ҮШҮ, 


Since this is tedious to evaluate, we refer instead to Table II and find that 
the answer is p(18; 15) — 0.0706. A 


Having derived the Poisson distribution as a limiting form of the binomial 
distribution, we can obtain formulas for its mean and its variance by applying 
the same limiting conditions (n > oo, 0 > 0, and п@ = A remains constant) to 
the mean and the variance of the binomial distribution. For the mean we get 
ш = n6 = А and for the variance we get а? = n6(1— 0) = A(1 — 0), which 


approaches A when 6 > 0. 


THEOREM 58 The mean and the variance of the Poisson distribution are 


given by 
и -A and oF =A 


These results can also be obtained by evaluating the necessary sums (see 


Exercise 16 on page 200) or by working with the moment-generating function. 
=. 
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THEOREM 5.9 The moment-generating function of the Poisson distribution 
is given by 


M,(t) = erie) 


Proof. By Definitions 4.6 and 5.7, 


v» Cd æ (Ae')* 

M,(t) = У e. i EQ 
x=0 > x! enh fo! 

“т с А VOTE. 
where Ў сап be recognized as the Maclaurin's series of e` with 
x=0 
= Ле". Mes 

M,(t) D e? 0 ere’ = е0) Y 


Then, if we differentiate M,(t) twice with respect to t, we get 


M) 25 Ae'g^ (7n 


MX(t) = Ae'e^'*'- Da Aeee 


so ду pc M:(0) = А and и; = M%(0) = А + A^. Thus, и = А and о? = 
и5— и? = (А + А?) — A? = A, which agrees with Theorem 5.8. 

Кошан the Poisson distribution has been derived as a limiting form of 
the binomial distribution, it has many applications which have no direct connec- 
tion with binomial distributions. For example, the Poisson distribution can serve 
as a model for the number of successes that occur during a given time interval 
orina specified region when (1) the numbers of successes occurring in non- 
overlapping time intervals or regions are independent; (2) the probability of à 
single success occurring in a very short time interval or in a very small region is 
proportional to the length of the time interval or the size of the region; and (3) 
the probability of more than one success occurring in such a short time interval 
or falling in such a small region is negligible. Hence, a Poisson distribution might 
describe the number of telephone calls per hour received by an office, the number. 
of typing errors per page, or the number of bacteria in a given culture, when the 
ded number of successes, А, for the given time interval or specified region is 

nown. 
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EXAMPLE 5.9 


The average number of trucks arriving on any one day at a truck depot in a 
certain city is known to be 12. What is the probability that on a given day fewer 
than 9 trucks arrive at this depot? : 


Solution 


Let x be the number of trucks arriving on a given day. Then, using Table 
II with A = 12, we get 


8 
Р(х < 9) = Y p(x;12) = 0.1550 ^ 
xx 


If, in a situation where the conditions above apply, successes occur at a 
mean rate of а per unit time or per unit region, then the average number of 
successes in an interval of t units of time or t units of the specified region is a 
Poisson random variable with mean A = at (see Exercise 14 on page 199). 
Therefore, the number of successes, х, in a time interval of length г units or a 
region of size t units has the Poisson distribution 


exe (at)* 


for x = 0, 1, 2,... 
x! 


p(x; at) = 


EXAMPLE 5.10 


A certain type of upholstery fabric has, on the average, 2 defects per 10 square 
yards. If one assumes a Poisson distribution, what is the probability that a 
30-square yard bolt of this fabric will have 4 or more defects? 


Solution 
Let x denote the number of defects in a 30-square yard bolt of the fabric. 
Then, since the unit of area is 10 square yards, we have 


A = at = (2)(3) = 6 


and 


3 


1 — P(x < 3) = 1- X р(х; 6) 


x=0 


P(x > 4) 


1 — 0.1512 = 0.8488 A 


W 
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THEORETICAL EXERCISES 


d 


The negative binomial distribution is sometimes defined differently than in 

this text as the distribution of the number of failures that precede the kth 

success. If the kth success occurs on the xth trial, it must be preceded by 

x — К failures. Thus, find the distribution of y = x — К, where x has the 

distribution of Definition 5.4. Also, use Theorem 5.6 to find expressions for 
2 

Hy and оу. 


2. Prove Theorem 5.5. 


. Derive the formulas for the mean and the variance of the negative binomial 


distribution by first determining E(x) and E[x(x + 1)]. 


. Show that the moment-generating function of the geometric distribution is 


given by 


6e* 


MO T тет) 


and use it to verify that ш = 1 and о? = Ux 


. Differentiating with respect to 0 the expressions on both sides of the equation 


Ў 0(1 = 0)! = 1 


show that the mean of the geometric distribution is given by ш = 3 Then, 


differentiating again with respect to @ show that ma LM and, hence, 


e.t 


‚ If x is a geometric random variable, show that. 


Р(х = x + п|х > п) = P(x = x) 


. If the probability is f(x) that a product fails the xth time it is being used, 


that is, on the xth trial, then its failure rate at the xth trial is the probability 
that it will fail on the xth trial given that it has not failed on the first x — 1 
trials; symbolically, it is given by 


2 f(x) 
SATIRE 


| 


10. 
11. 


12. 


13: 


14. 
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where F(x) is the value of the corresponding distribution function at x. Show 
that if x is a geometric random variable, its failure rate is constant, and equal 
to 6. 


. Hypergeometric sampling (sampling without replacement) is one variation 


of binomial sampling (sampling with replacement); another variation arises 
when the trials are independent but the probability of a success on the ith 
trial is 6,, and the 6’s are not all equal. If x is the number of successes 
obtained with this kind of sampling in n trials, show that 


TONES 
(a) py = п, where 6 = pi Y 4; 

iz 
(b) с? = п0(1 – 0) — noh, where @ is as defined in part (a) and o} = 


d ыу ы, ус: 
L. $ (9-0). 


і=1 


. When calculating all the values of a hypergeometric distribution, the work 


can often be simplified by first calculating h(0; п, М, k) and then using the 
recursion formula 
(n = x)(k = x) 

; (х; п, N, k 
Е Ne ne e ND 
Verify this formula and use it to calculate the values of the hypergeometric 

distribution with n — 4, N = 9, and К = 5. 
Verify the expression given for E[x(x — 1)] in the proof of Theorem 5.7. 
Show that if we let 0 = Xin Theorem 5.7, the mean and the variance of the 


hypergeometric distribution can be written as и = n8 and о? = 


N-n 
п0(1 — 0): 


МЕТ How do these results tie in with the discussion on 


page 192? 

Find a recursion formula for the Poisson distribution which expresses 
p(x * 1; A) in terms of p(x; А), and use it to verify the values given in Table 
Il for A = 2. (Use e^? = 0.1353.) 

Approximate the binomial probability (3; 100, 0.10) by looking up the 
Poisson probability p(3; 10) in Table 11, and compare the result with the 
exact probability calculated with the use of logarithms. 
Suppose that f(x, t) is the probability of getting x successes during a time 
interval of length t when (i) the probability of a success during a very small 
time interval from t to t + At is а. At, (ii) the probability of more than one 
success during such а time interval is negligible, and (iii) the probability of 
a success during such a time interval does not depend on what happened 


prior to time t. 
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15. 


16. 


17. 


18. 


19. 


(a) Show that under these conditions 


fos t * At) = f(x, D[1— a- At] + f(x 1, t)a - At 
and, hence, that 


ZUG арда) - fx oj] 


(b) Show by direct substitutión that a solution of this infinite system of 
differential equations (there is one for each value of x) is given by the 
Poisson distribution with A = at, 


Use integration by parts to show that 


= УЖЫ | Г 
=>. t'e'dt 
P y! x! da 


This result is important because values of the distribution function of a 
Poisson random variable may, thus, be obtained by referring to a table of 
incomplete gamma functions. 


Derive the formulas for the mean and the variance of the Poisson distribution 
by first evaluating E(x) and E[x(x — 1)]. 


Show that if the limiting conditions n > 90, 0 + 0, while n8 remains constant, 
are applied to the moment-generating function of the binomial distribution, 
we get the moment-generating function of the Poisson distribution. [ Hint: 


Make use of the fact that lim (: + 2) = е] 
noc n 


Use Theorem 5.9 to show that for the Poisson distribution а = E where 


оз is the measure of skewness defined in Exercise 10 on page 157. 
Differentiating with respect to A the expressions on both sides of the equation 


= ШЫ Ж 
Arm Oem aye 7 
х=0 A 


derive the following recursion formula for the moments about the mean of 
the Poisson distribution: 


du, 
nma = À | Fur +— 
Mrs} [ 1 a 


20. 
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for r = 1, 2,3,.... Also, use this recursion formula and the fact that ио = 1 
and ш, = 0 to find Hz, шз, #4, and verify the formula given for a, in 
Exercise 18. 


Use Theorem 5.9 to find the moment-generating function of y = x — A, where 
x is a random variable having a Poisson distribution with the parameter A, 
and use it to verify that a = A. 


APPLIED EXERCISES 


21. 


23. 
24. 


25. 


26. 


27. 


If the probability is 0.75 that a person will believe a rumor about the 
transgressions of a certain politician, find the probabilities that 


(a) the eighth person to hear the rumor will be the fifth to believe it; 


(b) the fifteenth person to hear the rumor will be the tenth to believe it. 


If the probabilities of having a male or female child are both 0.50, find the 
probabilities that 

(a) a family's fourth child is their first son; 

(b) a family’s seventh child is their second daughter; 

(c) a family’s tenth child is their fourth or fifth son. 


Use Theorem 5.5 and Table I to check the result of Example 5.5 on page 189. 


When taping a television commercial, the probability is 0.30 that a certain 
actor will get his lines straight on any one take. What is the probability that 
he will get his lines straight for the first time on the sixth take? 


In a “torture test" a light switch is turned on and off until it fails. If the 
probability is 0.001 that the switch will fail any time it is turned on or off, 
what is the probability that the switch will not fail during the first 800 times 
it is turned on or off? Assume that the conditions underlying the geometric 
distribution are met and use logarithms. 


A quality control engineer inspects a random sample of two hand-held 
calculators from each incoming lot of size 18, and accepts the lot if they are 
both in good working condition; otherwise, the entire lot is inspected with 
the cost charged to the vendor. What are the probabilities that such a lot will 
be accepted without further inspection if it contains 

(a) 4 calculators that are not in good working condition; 

(b) 8 calculators that are not in good working condition; 

(c) 12 calculators that are not in good working condition? 


cants for a job, ten have college degrees. If three of the 


Among the 16 appli 
mly chosen for interviews, what are the probabilities that 


applicants are rando 
(a) none has a college degree; 

(b) one has a college degree; 

(c) two have college degrees; 

(d) all three have college degrees? 
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31. 


What is the probability that an I.R.S. auditor will catch only two income,tax 
returns with illegitimate deductions, if she randomly selects five returns from 
among 15 returns of which 9 contain illegitimate deductions? 

А shipment of 80 burglar alarms contains 4 that are defective. If 3 of these 
are randomly selected and shipped to a customer, find the probability that 
the customer will get exactly one bad unit using 

(a) the formula of the hypergeometric distribution; 

(b) the binomial distribution as an approximation. 


. A large shipment of books contains 3 percent with imperfect bindings. Use 


the Poisson approximation to determine the probabilities that among 400 
books randomly selected from the shipment 

(a) exactly 10 will have imperfect bindings; 

(b) atleast 10 will have imperfect bindings. 


It is known from experience that 1.4 percent of the calls received by a 
switchboard are wrong numbers. Use the Poisson approximation to the 
binomial distribution to determine the probability that among 150 calls 
received by the switchboard two are wrong numbers. 


. Records show that the probability is 0.0004 that a car will break down while 


driving through a certain tunnel. Use the Poisson approximation to the 
binomial distribution to find the probability that among 2,000 cars driving 
through the tunnel at most one will break down. 


. In a certain desert region the number of persons who become seriously ill 


each year from eating a given kind of poisonous plant is a random variable 
having a Poisson distribution with A = 1.6. Find the probabilities of 

(a) 2 such illnesses in a given year; 

(b) atleast 7 such illnesses in 5 years. 


‚ In the inspection of a fabric produced in continuous rolls, the number of 


imperfections per ten yards is a random variable having a Poisson distribution 
with A = 2.8. Find the probabilities that 

(a) ten yards of the fabric will have 3 imperfections; 

(b) twenty yards of the fabric will have at most 6 imperfections. 


. If the number of incoming airplanes per minute at a large metropolitan 


airport is a random variable-having a Poisson distribution with A = 0.9, 
use Table II at the end of the book to find the probabilities that there will 
be 


(a) exactly 9 incoming planes during a period of 5 minutes; 
(b) fewer than 10 incoming planes during a period of 8 minutes; 
(c) at least 14 incoming planes during a period of 11 minutes. 
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58 THE MULTINOMIAL DISTRIBUTION 


An immediate generalization of the binomial distribution arises when each trial 
has more than two possible outcomes, the probabilities of the respective outcomes 
are the same for each trial, and the trials are all independent. This would be the 
case, for instance, when persons interviewed by an opinion poll are asked whether 
they are for a candidate, against her, or undecided, or when samples of manufac- 
tured products are rated excellent, above average, average, ог inferior. 

To treat this kind of problem in general, let us consider the case where 
there are n independent trials permitting k mutually exclusive outcomes whose 

k 


respective probabilities are ôi, 67, . . ., Ёк with У 6, = ). Referring to the out- 
і=1 


comes as being of the first kind, the second kind, ... , and the kth kind, we shall 
be interested in the probability of getting x, outcomes of the first kind, x; outcomes — - 
k 


of the second kind, ..., and х, outcomes of the kth kind (vim yx = п). 
i=l 


Proceeding as in the derivation of the formula for the binomial distribution, 
we first find that the probability of getting x, outcomes of the first kind, x; 
outcomes of the second kind, ..., and x, outcomes of the kth kind in a specific 
order is 0% + 03 * ... * 0%. To get the corresponding probability for that many 
outcomes of each kind in any order, we shall have to multiply the probability for 
any specific order by 


( n ) n! 
Xy X2, Хк xil A эы ЖЕ! 


according to Theorem 1.8 on page 12. 


DEFINITION 58 Тһе random variables Xi, X2,- -> and x,, have a multi- 
nomial distribution, and they are referred to as multinomial random vari- 
ables, if and only if their joint probability distribution is given by 


= 4 85.05. ‚ 05 
MAR Se MN 05:02... Өр 


k k 
for x, = 0,1,...,” for each i, where L x, = n and L 6, = 1. 


Ll osa ворса 
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Thus, the numbers of outcomes of the different kinds are random variables having 
the multinomial distribution with the parameters n, 6,, 62, ..., and 6,. The name 
“multinomial” derives from the fact that for various values of the x,, the 
probabilities equal corresponding terms of the multinomial expansion of 
(0,0; 6,)". 


'EXAMPLE 5.11 


In a given city on a Saturday night, Channel 12 has 50 percent of the viewing 
audience, Channel 10 has 30 percent of the viewing audience, and Channel 3 
has 20 percent of the viewing audience. Find the probability that among eight 
television viewers in that city, randomly chosen on a Saturday night, five will be 
watching Channel 12, two will be watching Channel 10, and one will be watching 
Channel 3. 


Solution 


Substituting x, = 5, x; = 2, хз = 1, 6, = 0.50, Ө, = 0.30, 0, = 0.20, and 
n — 8 into the formula of Definition 5.8, we get 


! 
(5, 2, 1; 8, 0.50, 0.30, 0.20) = кор (0.50)*(0.30)*(0.20) 
= 0.0045 А 


5.9 THE MULTIVARIATE 
HYPERGEOMETRIC DISTRIBUTION 


Just as the hypergeometric distribution takes the place of the binomial distribution 
for sampling without replacement, there also exists a multivariate distribution 
analogous to the multinomial distribution which applies to sampling without 
replacement. To derive its formula, let us consider a set of N elements, of which 
a, are elements of the first kind, a; are elements of the second kind, ..., and а, 


k 
are elements of the kth kind, such that Y а, = М. As in connection with the 
i=l 


multinomial distribution, we are interested in the probability of getting x, elements 
(outcomes) of the first kind, x; elements of the second kind, . .. , and x, elements 
of the kth kind, but now we are choosing, without replacement, n of the N 
elements of the set. 


a 
There are ( 
x 


) ways of choosing x, of the a, elements of the first 
1 
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ә a, ; 
‘kind, (i ways of choosing x, of the a, elements of the second kind, ..., and 
2. 


а 
(3) ways of choosing x, of the a, elements of the kth kind, and, hence, 
k 


NIU 


` N р A 
Since there are ( Y) ways of choosing n of the N elements in the set and we 


a k 
(=) ways of choosing the required У x = п elements. 
к i=l 


assume that they are all equally likely (which is what we mean when we say that 
the selection is random), it follows that the desired probability is given by 


(SS 


- 

DEFINITION 59 The random variables Xi, X2, - -> and x,, have a | 
ate hypergeometric distribution, and they are referred to as multivariate 
hypergeometric random variables, if and only if their joint probability 
distribution is given by 


fy, X2, «X5 My dis 02, - +» » Ak), = сз 


К k 
for x; = 0,1,...,nand x, < a, for each i, where У x, = nand 7 a, = N. 


i=l i=l 


Thus, the joint distribution of the random variables under consideration, namely, 
the distribution of the numbers of outcomes of the different kinds, is a multivariate 


hypergeometric distribution with the parameters n, i, @,..., and ay. 


EXAMPLE 5.12 


A panel of prospective jurors includes six married men, three single men, seven 
le women. If the selection is random, what is the 


married women, and four sing е о 
probability that a jury will consist of four married men, one single man, five 


married women, and two single women? 
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Solution 


Substituting x, = 4, x; = 1,34 = 5,x4 = 2, a, = 6, a) = 3, a; = 7,4, = 4, 
N = 20, and n = 12 into the formula of Definition 5.9, we get 


QC) 


/(%, 1,5, 2; 12,6,3,7,4) = 


APPLIED EXERCISES 


1. 


w 


The probabilities are 0.40, 0.50, and 0.10 that, in city driving, a certain kind 
of compact car will average less than 22 miles per gallon, from 22 to 26 miles 
per gallon, or more than 26 miles per gallon. Find the probability that among 
ten' such cars tested, three will average less than 22 miles per gallon, six will 


average from 22 to 26 miles per gallon, and one will average more than 26 
miles per gallon. 


. Suppose that the probabilities are 0.60, 0.20, 0.10, and 0.10 that a state income 


tax return will be filled out correctly, that it will contain only errors favoring 
the taxpayer, that it will contain only errors favoring the state, or that it will 
contain both kinds of errors. What is the probability that among 12 such 
income tax returns randomly chosen for audit, five will be filled out correctly, 
four will contain only errors favoring the taxpayer, two will contain only errors 
favoring the state, and one will contain both kinds of errors? 


According to the Mendelian theory of heredity, if plants with round yellow 
seeds are crossbred with plants with wrinkled green seeds, the probabilities 
of getting a plant that produces round yellow seeds, wrinkled yellow seeds, 
round green seeds, ог wrinkled green seeds are, respectively, %, ў, їх, and is. 
What is the probability that among nine plants thus obtained there will be 
four that produce round yellow seeds, two that produce wrinkled yellow seeds, 


three that produce round green seeds, and none that produce wrinkled green 
seeds? 


If 18 defective glass bricks include 10 that have cracks but no discoloration, 
five that have discoloration but no cracks, and three that have cracks and 
discoloration, what is the probability that among six of the bricks (chosen at 
random for further checks) three will have cracks but no discoloration, one 


will have discoloration but no cracks, and two will have cracks and dis- 
coloration? 


. Among 25 silver dollars struck in 1903 there are 15 from the Philadelpbia 


mint, seven from the New Orleans mint, and three from the San Francisco 
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mint. If five of these silver dollars are picked at random, find the probabilities 

of getting 

(a) four from the Philadelphia mint and one from the New Orleans mint; 

(b) three from the Philadelphia mint and one from each of the other two 
mints. 
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Special Probability 
Densities 


61 INTRODUCTION 


In this chapter we shall study some of the probability densities which figure most 
prominently in statistical theory and in applications. In addition to the ones given 
in the text, several others are introduced in the exercises following Section 6.4, 
and three probability densities that are of basic importance in the theory of 
sampling will be taken up in Chapter 8. As ia Chapter 5, we shall derive parameters 
а and moment-generating functions, again leaving as exercises some of the details. 


6.2 THE UNIFORM DENSITY 


The probability densities of Examples 3.8 and 3.11 are special cases of the uniform 
density, whose graph may be pictured as in Figure 3.7 on page 96. 


DEFINITION 61 А random variable x has a uniform density, and it is referred 
to as a continuous uniform random variable, if and only if its probability 


208 
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density is given by 


1 
f(x)-1B-« 


0 elsewhere 


fora<x<B 


The parameters с and f of this probability density аге real constants with a < f. 

Although the uniform density has some direct applications, one of which 
will be discussed in Section 7.2, its main value is that, due to its simplicity, it 
lends itself readily to the task of illustrating various aspects of statistical theory. 
The reader will be asked to verify in Exercise 2 on page 217 that 


THEOREM 61 The mean and the variance of the uniform density are given 
by 


«+В 


5 and о? = (B – ay 


= 
M 


6.3 THE GAMMA, EXPONENTIAL, 
AND CHI-SQUARE DISTRIBUTIONS 


Some of the examples and exercises of Chapters 3 and 4 dealt with random. 
variables having densities of the form 


kx2-le-"8 . forx > 0 
fo) = lt elsewhere 


where a > 0, B > 0, and К must be such that the total area under the curve is 


x ; Я 
equal to 1. То evaluate k, we first make the substitution y — B which yields 


co 


| кх e? dx = КВ" | y* e"? dy 


о 0 
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The integral thus obtained depends on а alone, and it defines the well-known 
gamma function 


I(a) = | y*'e"dy fora>0 
0 

which is treated in detail in most advanced calculus texts. Integrating by parts, А | 

which will be left to the reader in Exercise 4 on page 217, we find that the gamma _ 

function satisfies the recursion formula \ 


Г(а) = (а – 1) · I(a - 1) 
for a > 1 and since 


га) = | e" dy =1 


о 


it follows by repeated application of the recursion formula that Г(а) = (a — 1)! © 
when a is a positive integer. Also, an important special value is T) = Ут, as ЖШ 
the reader will be asked to verify in Exercise 6 on page 217. 4 

Returning now to the problem of evaluating k, we equate the integral we 


obtained to 1, getting 


| кхе e^? dx = kg^T(a) = 1 
0 
and, hence, 

1 


k = g*r(a) 


This leads to the following definition of the gamma distribution: 


———Àà 


DEFINITION 6.2 A random variable x has a gamma distribution, and it is 
referred to as a gamma random variable, if and only if its probability density 
is given by 


1 
f(x) = 1B*T() * 


0 elsewhere 


ае а тос i0 


where a > 0 and B > 0. 
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When a is not a positive integer, the value of Г(@) will have to be looked up in 
a special table. To give the reader some idea about the shape of the graphs of 
gamma densities, those for several special values of a and В are shown in Figure 
6.1. 


fix) 


Figure 6.1 Graphs of gamma distributions. 


Special cases of the gamma distribution play important roles in statistics; 
for instance, for a = 1 and В = 0 we get 


DEFINITION 63 A random variable x has an exponential distribution, and 

it is referred to as an exponential random variable, if and only if its 
probability density is given by 

us en? forx > 0 

f(x) = 40 

0 


elsewhere 


where 0 > 0. 


To show how an exponential distribution might arise in practice, let us refer 

“to the situation described in Exercise 14 on page 199, where we were interested 
in the probability of getting x successes during a time interval of length t when 
(i) the probability of a success during a very small time interval from / to t + At 
is æ - At, (ii) the probability of more than one success during such a time interval 
is negligible, and (iii) the probability of a success during such a time interval 
does not depend on what happened prior to time t. In that exercise, we showed 
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that the number of successes is a value of the discrete random variable x having 
the Poisson distribution as defined on page 194 with A = at. Now, let us determine 
the probability density of the continuous random variable y, the waiting time 
until the first success. Clearly, 
F(y) = Ply < у) = 1 - Py» у) 

= | — P(0 successes in a time interval of length y) 

= 1 — p(0; ау) 

X e ? (ay)? 

oL mE 

mI-e- fory > 0 
and F(y) = 0 elsewhere. Having thus found the distribution function of y, 
differentiation with respect to y yields 

ae” for y > 0 
fly) = 5 elsewhere 
which is the exponential distribution with Ө = EN 
a 
The exponential distribution applies not only to the occurrence of the first 
success in a Poisson process, which is what we call a situation like that described 
in Exercise 14 on page 199, but by virtue of condition (iii), see Exercise 12 on 
page 218, it applies also to the waiting times between successes. 
EXAMPLE 6.1 


If the number of speeders a radar unit spots per hour at a certain location on 
Route 1-10 is a Poisson random variable with A = 8.4, what is the probability of 
a waiting time less than 10 minutes between successive speeders? 


Solution 


Since the unit time interval here is one hour, we have A = а = 8.4. Treating 
the waiting time as a random variable having an exponential distribution 


1 
ith 0 = — 
with 6 84° we get 


1/6 1/6 
| 84e * ^ dx e pto TTAF ү 
о о 


which is approximately 0.75. A 
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> . H H H "v 
Another special case of the gamma distribution arises when о Г апа 


В = 2, where v is the lowercase Greek letter nu. 


DEFINITION 64 A random variable x has a chi-square distribution, and it 
is referred to as a chi-square random variable, if and only if its probability 
density is given by 


f(x) = 2710/2)" forx > 0 


0 elsewhere 


The parameter v is referred to as the number of degrees of freedom, or simply 
the degrees of freedom. The chi-square distribution plays a very important role 
in sampling theory, and it will be discussed in some detail in Chapter 8. 

To derive formulas for the mean and the variance of the gamma distribution 
and, hence, the exponential and chi-square distributions, let us first prove the 
following theorem: 


THEOREM 62 The rth moment about the origin of the gamma distribution 
is given by 


. B'I(a + г) 
и. = Tia) 


П 


Proof. Ву Definition 42, 


where we let y — " Since the integral on the right is I(r + а) according 


to the definition of the gamma function on page 210, this completes the 
proof. M 


Using Theorem 6.2, let us now prove the following results: 


214 Chap. 6: Special Probability Densities 


THEOREM 63 The mean and the variance of the gamma distribution are 
given by 


и = оВ and о? = ap? 


Proof. From Theorem 6.2, 


s BAUEN! 
ш = Г(а) = ав 
апа 
, _ BT (a +2) _ 2 
2 = T(a) = a(a + 1)B', 


so that u = aß and o° = a(a + 1)8? —(aB) = ав. + 


COROLLARY 1 The mean and the variance of the exponential distribution 
are given by 
ш= @ апі о? = 0? 


COROLLARY 2 The mean and the variance of the chi-square distribution 
are given by 


и => and о? = 20 


To obtain these results from Theorem 6.3, we substitute а = 1 and В = 6 for 
the exponential distribution, and a = 5 and B = 2 forthe chi-square distribution. 


For future reference, let us give here also the moment-generating function 
of the gamma distribution. 


THEOREM 64 The moment-generating function of the gamma distribution 
is given by 


My(t) = (1 — Br) * 
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The reader will be asked to prove this result, and use it to find some of the lower 
moments of the gamma distribution, in Exercises 9 and 10 on page 217. 


6.4 THE BETA DISTRIBUTION 


The uniform density f(x) = 1 for0 < x < 1 and f(x) = 0 elsewhere is a special 
case of the beta distribution, which is defined in the following way: 


DEFINITION 65 A random variable x has a beta distribution, and it is 
referred to as a beta random variable, if and only if its probability density 
is given by 


Г(а + В) 
f(x) = 4 Г(а) : Г(8) 


0 elsewhere 


x71 — x)?" for(0<x<1 


where a > 0 and В > 0. 
eats 


In recent years, the beta distribution has found important applications in Bayesian 
inference, where parameters are looked upon as random variables, and there is 
a need for a fairly "flexible" probability density for the parameter 0 of the 
binomial distribution, which takes on non-zero values only on the interval from 
0 to 1. By "flexible" we mean that the probability density can take on a great 
variety of different shapes, as the reader will be asked to verify for the beta 
distribution in Exercise 19 on page 219. This use of the beta distribution will be 
discussed in Chapter 10. 

We shall not prove here that the total area under the curve of the beta 
distribution is equal to 1, but in the proof of the theorem which follows we shall 


make use of the fact that 


КИШ ЕН Ву Le а-а qvis 
раа (1 - х)? ' dx = 1 


and, hence, that 


x*^(1- х)?‘ dx = 


[ I(a) : Г(8) 
о Г(о + B) 
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This integral defines the beta function, whose values are denoted B(a, 8); in other 
T(a) - I(8) 
T(a * B) 
found in any textbook on advanced calculus. 


words, B(a, 8) — . Detailed discussion of the beta function may be 


THEOREM 6.5 The mean and the variance of the beta distribution are given 


by 
pecu and A CRY ERY 
Proof. By definition, 


. RERE) Г(а +1): ГВ) 
T(a):T(8) Г(а+ В +1) 


а 
а + В 


where we recognized the integral as В(а + 1, 8) and made use of the fact 
that Г(« + 1) = а · T(a) and I (a + B +1) = (а + В) : Ta + В). 
Similar steps, which will be left to the reader in Exercise 20 on page 
219, yield 
; (a + 1)a 
Lm M 
(a + B + 1)(a + B) 


And it follows that 


a= ete BET 
(a+ B + 1)(a + B) «+В 
* af 
7 (а + Ва +в+) " 


THEORETICAL EXERCISES 


1. Show that if a random variable has a uniform density with the parameters 
о and В, the probability that it will take on a value less than a + p(B — a) 
is equal to p. 
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2. Prove Theorem 6.1. 


3. A random variable is said to have a Cauchy distribution if its density is given 
by 


B 


т 


PT D ORS for —o00 < x < oo 
(x — a)? + В? 


f(x), = 
Show that for this distribution ш; and ш; do not exist. 


4. Use integration by parts to show that I'(a) = (a — 1). Г(а — 1) fora > 1. 


Б. Perform a suitable change of variable to show that the integral defining the 
gamma function can be written as 


(айе 252 f z2% e}? dz fora>0 


о 


6. Using the form of the gamma function of Exercise 5, we can write 
TQ) = af еті? dz 
о 


and, hence, 


tror = {f er «(f eva = [ [ pcs 


Change to polar coordinates to evaluate this double integral, and thus show 
that rG) = Ут. 

7. Find the probabilities that the value of a random variable will exceed 4, if 
it has a gamma distribution with 
(а) а= 2апі B = 3; 
(b) а = Запі B = 4. 

8. Show that a gamma distribution with а > 1 has a relative maximum at 
x = B(a — 1). What happens when 0 < а < 1 and when a = 1? 


1 " Н 
9. Prove Theorem 6.4, making the substitution у = (5 = ) in the integral 


defining M(t). 
10. Expand the moment-generating function of the gamma distribution as a 
binomial series, and read off the values of pi, и>, and p3. 
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11. 


12. 


13. 


14. 


15. 


16. 


Show that if a random variable has an exponential density with the parameter 
6, the probability that it will take on a value less than —6 - In(1 — p) is equal 
to p for 0 <р < 1. 


If x has an exponential distribution, show that 
P(x > t + Т|х = T) = Р(х 2 t) 


This property of an exponential random variable parallels that of a geometric 
random variable given in Exercise 6 on page 198. 

A random variable x has a Rayleigh distribution if and only if its probability 
density is given by 


ax? 


f(x) = pus forx > 0 
а ~ lo elsewhere 
where a > 0. Show that for this distribution 
1 т 
үе 22^ 


А random variable x has a Pareto distribution if and only if its probability 
density is given by 


=. forx > 1 
f(x) 24x" 


0 elsewhere 


Show that ш, exists only if r < o. 


A random variable x has a Weibull distribution if and only if its probability 
density is given by 


kx? e^ — forx > 0 
0 elsewhere 


f(x) = | 


where a > 0 and 8 > 0. 


(a) Express К in terms of a and f. 


() Show that = a^""r(1 + 4), 


If the random variable t is the time to failure of a commercial product and 
the values of its probability density and distribution function at time t аге, 
respectively, f(t) and F(t), then its failure rate at time г (see also Exercise 


17. 


18. 


20. 
21. 
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BUSCO 
EG) 

probability density of We. at time t given that failure does not occur prior 

to time t. 

(a) Showthatifthas an exponential distribution, the failure rate is constant. 

(b) Show that if t has a Weibull distribution (see Exercise 15), the failure 

rate is given by agr?" '. 

Verify that the integral of the beta density, from 0 to 1, equals 1 for 

(a) а = 2апі В = 4; 

(b) a =3 and B = 3. 

Show that if a > 1 and £ > 1, the beta density has a'telative maximum at 


7 on page 198) is given bY FD . Thus, the failure rate at time t is the 


es «—1 
^ et8-2 
. Sketch the graphs of the beta densities having 
(a) a =2апа B = 2; (b а= апі B = 1; 
(c) a = 2апа В.= 2; (d) а= 2апі B = 5. 


[Hint: To evaluate T ($) and Г($) make use of the recursion formula Г(а) = 
(a — 1) : Г(а — 1) and the result of Exercise 6.] 


Verify the expression given for ш in the proof of Theorem 6.5. 


Show that the parameters of the beta distribution can be expressed as follows 
in terms of the mean and the variance of this distribution: 


(a) a= [8052 - 1]: 


a-we -i]. 


(b) B 


. Karl Pearson, one of the founders of modern statistics, showed that the 


differential equation 


1 d[fo) ^ 4-x 


yields (for appropriate values of the constants a, b, c, and d) most of the 
important distributions of statistics. Verify that the differential equation gives 
(a) the gamma distribution when a = c = 0, b > 0, and d > -b; 


(b) the exponential distribution when a = c = d = 0 and b > 0; 
d-1 d 
(c) the beta distribution when a = 0, b = —c, TOES « 1, and b »-1. 
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APPLIED EXERCISES 


23. 


24. 


26. 


27. 


28. 


A point X is chosen on the line AB, whose midpoint is C and whose length 
is a. If x, the distance from X to A, is a random variable having a uniform 
density with a = 0 and B = a, what is the probability that AX, BX, and AC 
will form a triangle? 


In certain experiments, the error made in determining the density of a 
substance is a random variable having a uniform density with a = —0.015 
and 3 = 0.015. Find the probabilities that 

(a) such an error will be between 0.01 and 0.02; 

(b) the size of such an error will exceed 0.005. 


If a company employs n salespersons, its gross sales in thousands of dollars 
may be regarded as a random variable having a gamma distribution with 
а = 80/n and В = 2. If the sales cost is $8,000 per salesperson, how many 
salespersons should the company employ to maximize the expected profit? 


In a certain city, the daily consumption of electric power, in millions of 
kilowatt-hours, can be treated as a random variable having a gamma distribu- 
tion with а = Запі В = 2. If the power plant of this city has a daily capacity 
of 12 million kilowatt-hours, what is the probability that this power supply 
will be inadequate on any given day? 


The mileage (in thousands of miles) which car owners get with a certain kind 
of radial tire is a random variable having an exponential distribution with 
6 — 40. Find the probabilities that one of these tires will last 

(a) at least 20,000 miles; 

(b) at most 30,000 miles. 


A certain kind of appliance requires repairs on the average once every two 
years. Assuming that the times between repairs are exponentially distributed, 


what is the probability that such an appliance will work at least three years 
without requiring repairs? 


. If the annual proportion of erroneous income tax returns filed with the I.R.S. 


can be looked upon as a random variable having a beta distribution with 
а = 2 and В = 9, what is the probability that in any given year there will 
be fewer than 10 percent erroneous returns? 


If the annual proportion of new restaurants that fail in a given city may be 

looked upon as a random variable having a beta distribution with @ = 1 and 

B = 4, find 

(a) the mean of this distribution, namely, the annual proportion of new 
restaurants that can be expected to fail in the given city; 


(b) the probability that at least 25 percent of all new restaurants will fail 
in the given city in any one year. 
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31. Suppose that the service life, in hours, of a semiconductor is a random 
variable having a Weibull distribution (see Exercise 15) with @ = 0.025 and 
B = 0.50. 
(a) How long can such a semiconductor be expected to last? 
(b) What is the probability that such a semiconductor will still be in 
operating condition after 4,000 hours? 


65 THE NORMAL DISTRIBUTION 


The normal distribution, which we shall study in this section, is in many ways 
the cornerstone of modern statistical theory. It was investigated first in the 
eighteenth century when scientists observed an astonishing degree of regularity 
in errors of measurement. They found that the patterns (distributions) which they 
observed could be closely approximated by continuous curves which they referred 
to as "normal curves of errors" and attributed to the laws of chance. The 
mathematical properties of such normal curves were first studied by Abraham 
de Moivre (1667-1745), Pierre Laplace (1749-1827), and Karl Gauss (1777-1855). 


DEFINITION 66 A random variable x has a normal distribution, and it is 
referred to as a normal random variable, if and only if its probability density 
is given by 


e for -œ < x «o 


1 
n(x; wy 0) = a= 


where с > 0. 


The graph of a normal distribution, shaped like the cross section of a bell, is 
shown in Figure 6.2. 

The notation which we used here is similar to that used in connection with 
some of the distributions of Chapter 5; it shows explicitly that the two parameters 
of the normal distribution are и. and ø. It remains to be seen, however, whether 
the parameter и is, in fact, E(x) and the parameter с is, in fact, Vvar(x), where 
x is a random variable having a normal distribution with these two parameters. 

First, though, let us show that the formula of Definition 6.6 can serve as a 
probability density. Since the values of n(x; и, с) are evidently positive so long 
asc > 0, we must show that the total area under the curve is equal to 1. Integrating 
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Figure 6.2 Graph of normal distribution. 


; SEN x- 
from —o to co and making the substitution z = 


Е; we get 


VU Ee 1 f. 1 2 [ à 
— No -— E -— -jr 
NE qc med 


Then, since the integral on the right equals TO ут 35] T according to Exercise 6 
on page 217 it follows that the total der th i 1 a. ут 
pag e total area under the curve is equa Dos p d 


Next, let us show that 


THEOREM 6.6 The moment-generating function of the normal distribution 
is given by 


M,(t) ы ено? 


Proof. Ву definition, 
oo -- EM Н 
мө = | е“. 1 е 1 9) мае 


1 
1 œ — --—3l-2xe) + (x – 4)] 
- SA ate Nga dx 
о/2т Jj. 
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and if we complete the square, that is, use the identity 
-2xta? + (x = ш)? = [x — (и + to?) - 2uto? — tot 


we get 


i [x - (u + 1) 


M,(t) = evel 1 | „ес | 


ov2T NS. 
Since the quantity inside the braces is the integral, from —00 to oo, of a 
normal density with the parameters и + tc? and с, and hence equal to 1, 
it follows that 


M,(1) = etim у 


Twice differentiating M,(t) with respect to t, we get 


My(t) = (и + ом). M,(t) 
Mx(t) = [Ga + е)? + o°] © М,(!) 


so that Mi(0) = ш and M%(0) = и? + o°. Thus, E(x) = и and var(x) = 
(u? + 0?) — ш? = o°, which verifies that the parameters ш and c are, indeed, 
the mean and the standard deviation of the normal distribution. 

Since the normal distribution plays a basic role in statistics and its density 
cannot be integrated directly, its areas have been tabulated for the special case 
where ш = 0 апіс = 1. 


DEFINITION 67 The normal distribution with и = 0 and o = 1 is referred 
to as the standard normal distribution. 


The entries in Table III, represented by the shaded area of Figure 6.3, are the 
values of 


ЗИ FC 
f We e™ dx 
namely, the probabilities that a random variable having the standard normal 
distribution will take on a value on the interval from 0 to z, for z — 0.00, 0.01, 
0.02,..., 3.08, and 3.09, and also z = 4.0, z = 5.0, and z = 6.0. By virtue of the 
symmetry of the normal distribution about its mean, it is unnecessary to extend 
Table III to negative values of z. 
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Figure 6.3 Tabulated areas under the standard normal distribution. 


Occasionally, we are required to find a value of z corresponding to a 
specified probability that falls belween values listed in Table III (see Exercises 
17 and 19 on pages 237 and 238). For convenience, we shall always choose the 
z value corresponding to the tabular value that comes closest to the specified 
i probability. However, if the given probability falls midway between tabular values, 
we shall choose for z the value falling midway between the corresponding values 
of z. For instance, to find the value of z which corresponds to a probability of 
0.3512, which falls between 0.3508 and 0.3531 in Table III, we choose z — 1.04 
since 0.3508 is closer to 0.3512. On the other hand, for a probability of 0.2533, 
which falls midway between 0.2517 and 0.2549, we take z — 0.685. 


EXAMPLE 6.2 


Find the probabilities that a random variable having the standard normal distribu- 
tion will take on a value (a) less than 1.72, (b) less than —0.88, (c) between 1.30 
and 1.75, and (d) between —0.25 and 0.45. 


Solution 


(a) Welook up the entry corresponding to z = 1.72 in Table III, add 0.5000 
(see Figure 6.4), and get 0.4573 + 0.5000 = 0.9573; 

(b) we look up the entry corresponding to z — 0.88 in Table III, subtract 
it from 0.5000 (see Figure 6.4), and get 0.5000 — 0.3106 — 0.1894; 

(c) welook up the entries corresponding to z = 1.75 and z = 1.30 in Table 
IIT, subtract the second from the first (see Figure 6.4), and get 0.4599 — 
0.4032 = 0.0567; 

(d) we look up the entries corresponding to z = 0.25 and z = 0.45 in Table 
III, add them (see Figure 6.4), and get 0.0987 + 0.1736 = 0.2723. А 


To determine probabilities relating to random variables having normal 
distributions other than the standard normal distribution, we make use of the 
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О 1.30 1.75 —0.250 0.45 


Figure 6.4 Diagrams for Example 6.2. 


following theorem: 


THEOREM 67 If x has a normal distribution with the mean и and the 
standard deviation g, then 


has the standard normal distribution. 


Proof. Since the relationship between the values of x and z is linear, 


XP Em 
z must take on a value between z, — zd and z, = — when x takes 
g g 


on a value between x, and x;. Hence, we can write 


" 1x - aM. 
1 саз үшү 
Pin < к< э) = эше] е ) 


x 


1 f eV dz 


dx 


n(z; 0,1) dz 


1 
— 
n 


z 


P(z, < 2 < 2%) 
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where z is seen to be a random variable having the standard normal 
distribution. M 


Thus, to use Table III in connection with any random variable having a normal 


s 
distribution, we perform the change of scale z — 5 E 


EXAMPLE 6.3 


Suppose that the amount of cosmic radiation to which a person is exposed when 
flying by jet across the United States is a random variable having a normal 
distribution with a mean of 4.35 mrem and a standard deviation of 0.59 mrem. 
What is the probability that a person will be exposed to more than 5.20 mrem of 
cosmic radiation on such a flight? 


Solution 
20 — 4.35 
We look up the entry corresponding to z = a = 1.44 in Table 
III, subtract from 0.5000 (see Figure 6.5), and get 0.5000 — 0.4251 = 
0.0749. A 


Figure 6.5 Diagram for Example 6.3. 


6.6 THE NORMAL APPROXIMATION 
TO THE BINOMIAL DISTRIBUTION 


The normal distribution is sometimes introduced as a continuous distribution 
which provides a close approximation to the binomial distribution when n, the 
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number of trials, is very large and 6, the probability of a success on an individual 
trial, is close to 5. Figure 6.6 shows the histograms of binomial distributions with 
6 — and n = 2,5, 10, and 25, and it can be seen that with increasing n these 
distributions approach the symmetrical bell-shaped pattern of the normal distri- 
bution. 


Figure 6.6 Binomial distributions with @=}. 


To provide a theoretical foundation for this argument, let us first prove the 
following theorem: 


THEOREM 68 If x is a random variable having a binomial distribution with 
the parameters n and 6, then the moment-generating function of 


x — пб 


Улё(Т = 8) 


approaches that of the standard normal distribution when п > оо, 


Proof. Making use of Theorems 4.10 and 5.4, we can write 


M,(t) = M,_,(t) = e "77 [1 e(e"* — 1)]" 
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where = n6 and ø = V/n&(1 — 6). Then, taking logarithms and substitut- 
ing the Maclaurin’s series of е'/°, we get 


t 
In M, (0) = - + n Inr (e? – 0) 


Hs nonfat of £4 5(4) + 
с c 2c 


and, using the infinite series In(1 + x) = x — x^ + ix^ — » -, which con- 
verges for |x| < 1, to expand this logarithm, it follows that 


ш t TO) TO) | 
M (0) = + no|—+={—}) +i) +. 
in nul!) с Р [: 2\e 6 \o 
уз d aal 
— —]—+-[-] + l| +... 
2 Do 2E ГА 
mele (E TO if 
+ — – + – { —– +-{— Mj meee 
3а 20е 6No 
Collecting powers of t, we obtain 
ш n8 nð n8? 
M, ()={-=+—|+(—-——) 
» xal ( с 2) (5 2)! 


* (25 = + 2a * 
605 0g? Зо? 


1 6 – 2 m 2 3 
-i(m-)e. (t 30 ea 
c 2 СА 6 


since ш = n6. Then, substituting с = Vn@(1 — 0), we find that 


арі 3 
e«s( 38? + 20 ye 


1 


ain 


In м) = e 


: Я п Р 
where for r > 2 the coefficient of t” is a constant times —, which approaches 
c 


0 when n > ©. It follows that 
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and since the limit of a logarithm equals the logarithm of the limit (provided 
the two limits exist), we conclude that 


lim M, ,(£) = е 


nc ZE 
с 


which is the moment-generating function of Theorem 6.6 with и = 0 and 
с = 1. M 


This completes the proof of Theorem 6.8, but have we shown that when 
n > оо the distribution of z, the standardized binomial random variable, 
approaches the standard normal distribution? Not quite. To this end, we must 
refer to two theorems which we shall state here without proof: 


1. There is a one-to-one correspondence between moment-generating functions 
and probability distributions (densities) when the former exist. 

2. Ifthe moment-generating function of one random variable approaches that of 
another random variable, then the distribution (density) of the first random 
variable approaches that of the second random variable under the same limiting 
conditions. 


Strictly speaking, our results apply when n > co, but the normal distribution is 
often used to approximate binomial probabilities even when n is fairly small. A 
good rule of thumb is to use this approximation only when пө and n(1 — @) are 
both greater than 5. 


EXAMPLE 6.4 


Find the probability of getting 6 heads and 10 tails in 16 tosses of a balanced 
coin, and also use the normal distribution to approximate this probability. 


Solution 


Substituting x = 6, n = 16, and Ө = ! into the formula for the binomial 
distribution, we get 


Клен} AOC b a _ 8,008 

A. TEN E N2 2 65,536 
or 0.1222 rounded to four decimals. To find the normal approximation to 
this probability, we must use the continuity correction according to which 


each non-negative integer k is represented by the interval from k — 3 to 
k + 1, and, hence, we must determine the area under the curve between 
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IU. 


5.5 and 6.5 (see Figure 6.7). Since д = 16: 3 = 8 апіс = /16.1.1- 
we must determine the area under the curve between 


52072518 65-8 
- zo]. = = -0.75 
2 2 1.25 апа 2 2 
Number 
6.5 8 of heads 


2-—125 2= –0.75 


Figure 6.7 Diagram for Example 6.4. 


Since the entries in Table IIT Corresponding to z = 1.25 and z = 0.75 аге | 
0.3944 and 0.2734, we find that the normal approximation to the probability — 
of “6 heads and 10 tails in 16 tosses of a balanced coin" is 0.3944 — 0.2734 = 7 
0.1210. This is very close, indeed, to the correct value of this probability D 
rounded to four decimals. A ! 


EXAMPLE 6.5 


Use the normal approximation to the binomial distribution to find the probability 
that at least 70 of 100 mosquitos will be killed by a new insect spray, when the 
probability is 0.75 that any one of them will be killed by the spray. 


Solution 


Using the same continuity correction as in Example 6.4, we must find the 
area under the curve to the right of 69.5 (see Figure 6.8), and since 
# = 100(0.75) = 75 and с = /100(0:75)(025) = 4.33, we must find the 
area under the curve to the right of 


ie e$ E i5 
Әз creo 
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Number 
——— — - = of mosquitos 
69.5 75 killed 
z-2—127 Л 


Figure 6.8 Diagram for Example 6.5. 


Since the entry in Table III corresponding to z — 1.27 is 0.3980, the answer 
is 0.3980 + 0.5000 = 0.8980. Comparison with the correct value of this 
probability (looked up in the tables by Romig listed on page 207) shows 
that the error of the approximation is only 0.0018. A 


6.7 THE BIVARIATE NORMAL 
DISTRIBUTION 


Among multivariate densities, of special importance is the multivariate normal 
distribution, which is a generalization of the normal distribution in one variable. 
As it is best (indeed, virtually necessary) to present this distribution in matrix 
notation, we shall give here only the bivariate case; discussions of the general 


case are referred to on page 239. 


DEFINITION 68 А pair of random variables x and y have a bivariate normal 
distribution, and they are referred to as jointly normally distributed random 
variables, if and only if their joint probability density is given by 


ET : E [(: =e)? E »( zy =H) р Е =)'] 
ess la gael a пв шш 
Дж») = 2т=, ОМ. 


for —оо < x < o and —o < y < оо, where c, > 0, с > 0, and 
-1€p «Xl. 
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To study this distribution, let us first show that the parameters ш, 42, оу, and 
с, are, respectively, the means and the standard deviations of the random 
variables x and y. Integrating on y, from —oo to co, to obtain the marginal density 
of x, we can write 


1 (==) 
e 21-0) о, E 


g(x) = Pili 
2Qno,on 1 = p° J-« 
‘ ; E x= ete К 
Temporarily making the substitution и = Ра to simplify the notation and 
ei 
changing the variable of integration by letting v — zc e we obtain 
22 


Элә 2 ! 3-3 
Ман e 1-5) | Кеш суут зден 
2no,V1 — р? /-® 
and, after completing the square by letting v^ — 2puv = (v — pu)’ — р?и? and 
collecting terms, this becomes 


РУТЕ с Бб» 2S) а) 
ovas agi = pi = 


Finally, identifying the quantity in parentheses as the integral of a normal density 
from —о0 to co and, hence, equalling 1, we get 


(суз aunt 1. (=) i 
ewm от 


for =œ < x < оо. It follows by inspection that the marginal density of x isa 
normal distribution with the mean ш, and the standard deviation c, and, by 
symmetry, that the marginal density of y is a normal distribution with the mean 
но and the standard deviation o;. 

So far as the parameter p is concerned, where p is the lowercase Greek 
letter rho, it is called the correlation coefficient, and the necessary integration vill 
show that cov(x, у) = роо. Thus, the parameter p measures how the random 
variables x and y vary together, and its significance will be discussed further in 
Chapter 14. 

When we deal with a pair of random variables having a bivariate normal 
distribution, their conditional densities are also of importance; so, iet us prove 
the following theorem: 
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THEOREM 69 If x and y have a bivariate normal distribution, the condi- 
tional density of y given x — x is a normal distribution with the mean 


and the conditional density of x given y — y is a normal distribution with 
the mean 


and the variance 


а; 
шух = ро + p— (x — д) 
a 


and the variance 


о = 0-00) 


с 
шыу = bi + P (у= м) 
0 


2 2 2 
Oxy = ei(1— p^) i 
js fos»). ; d 
Proof. Writing w(y|x) = g(x) in accordance with Definition 3.13, 
and letting u = кеч: and v = z E to simplify the notation, we get 
с 2 


1 
LÁ) = 2puv + v? 
e 3-25. д ! 


w(ylx) = 1 7 
ix 
e 
v270, 
m) H conem арыш ре? 
У2зто›у1 -p 
1 


Ё Узто›уї = р? 


Then, expressing this result in terms of the original variables, we obtain 


о, (3 
i Mi tec i) 
en 4 el - p? 


1 
SO e ow 2n 41 E р? 
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for =% < y < co, and it can be seen by inspection that this is a normal 


с; 3 2 
density with the mean uy, = pz + р(х = pı) and the variance o7, = 
a 
a3(1 — p°). The corresponding results for the conditional density of x given 


y 7 y follow by symmetry. v 
The bivariate normal distribution has many important properties, some 


statistical and some purely mathematical. Among the former, there is the following 
property, which the reader will be asked to prove in Exercise 9 on page 236: 


THEOREM 6.10 If two random variables have a bivariate normal distribu- 


tion, they are independent if and only if p — 0. 


In connection with this, if o = 0 the random variables ‘are said to be uncorrelated. 

Also, we have shown that for two random variables having a bivariate 
normal distribution the two marginal densities are normal, but the converse is 
not necessarily true. In other words, the marginal distributions may both be 
normal without the joint distribution being a bivariate normal distribution. For 
instance, if the bivariate density of x and y is given by 


2f(x, y) inside squares 2 and 4 of Figure 6.9 
Р(х,у) 240 inside squares 1 and 3 of Figure 6.9 
f(x, y) elsewhere 


where f(x, y) is the value of the bivariate normal density with ш, = 0, m = 0, 
and p = 0 at (x, y), it is easy to see that the marginal densities of x and y are 
normal even though their joint density is not a bivariate normal distribution. 


Figure 6.9 Sample space for the bivariate density given by f*(x, у). 
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Many interesting properties of the bivariate normal density are obtained 


by studying the bivariate normal surface, pictured in Figure 6.10, whose equation 
is z — f(x, y), where f(x, y) is the value of the bivariate normal density at (x, y). 
As the reader will be asked to verify ir the exercises that follow, the bivariate 
normal surface has a maximum at (ш, 42), any plane parallel to the z-axis 
intersects the surface in a curve having the shape of a normal distribution, and 
any plane parallel to the xy-plane which intersects the surface intersects it in an 
ellipse called a contour of constant probability density. When p = 0 and e, = с», 
the contours of constant probability density are circles, and it is customary to 
refer to the corresponding joint density as a circular normal distribution. 


Figure 6.10 Bivariate normal surface. 


THEORETICAL EXERCISES 


7; 


Show that the normal distribution has a relative maximum at x = u and 
inflection points at x = и — c and x = u + о. 


. Show that the differential equation of Exercise 22 on page 219 yields a normal 


distribution when b = c = 0 and a > 0. 


. Twice more differentiating the moment-generating function of the normal 


distribution (see page 223), verify that u, = 0 and ш = 30%. 

Use the moment-generating function of the normal distribution given in 
Theorem 6.6 to show that for the normal distribution a, = 0 and a, = 3, 
where оз and a, are as defined in Exercises 10 and 11 on pages 157 and 158. 
Prove Theorem 6.7 on page 225 by showing that the moment-generating 
function of 


is the moment-generating function of the standard normal distribution. 
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11. 


12. 


13. 


14. 


r 


t 
. If we let К„(ї) = In M,_,(t), the coefficient of A in the Maclaurin’s series 


of K,(t) is called the rth cumulant and it is denoted by «,. Equating coefficients 
of like powers, show that 

(а) к› = шз; (b ҝ = Ms; 

(с) ка = ша – 303; (d) к; = us — 10psp2. 

Also show that for the normal distribution x. = о? and all other cumulants 
are zero. 


. Show that when A > œ, where А is the parameter of the Poisson distribution, 


then the moment-generating function of 


namely, that of a standardized Poisson random variable, approaches the 
moment-generating function of the standard normal distribution. 


. Show that when а > оо and f remains constant, the moment-generating 


function of a standardized gamma random variable approaches the moment- 
generating function of the standard normal distribution. 


. Prove Theorem 6.10. 
10. 


Show that any plane perpendicular to the xy-plane intersects the bivariate 
normal surface in a curve having the shape of a normal distribution. 


If the exponent of e of a bivariate normal density is 


-1 
TE + 2)? – 2.8(x + 2)(y – 1) + 4(у – 1)] 


find 
(a) щш, M2, Ci, сз, and p; 
(b) шу and e. 


If the exponent of e of a bivariate normal density is 
EE 2 
34 + 4у + 2xy + 2х + 8y + 4) 


find o,, 72, and р, given that ш, = 0 and д, = –1. 


If x and y have a bivariate normal distribution, u = x + y and v = х — Y 
find an expression for the correlation coefficient of u and v. 


If x and y have a bivariate normal distribution, it can be shown that the joint 
moment-generating function (see Exercise 6 on page 172) of these random 
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variables is given by 


Му, t) = Ele") 


242. 212) 
і" езй tiat eitten oh hta 


Verify that 

(a) the first partial derivative of this function with respect to t; at t; = 0 
and t; = 0 is шу; 

(b) the second partial derivative with respect to t, at t, = 0 and t; = 0 is 
оз + и; 

(c) the second partial derivative with respect to t, and t; at t, = 0 aná 
t; = 0 is payor + pitz. 


APPLIED EXERCISES 


15. 


16. 


17. 


18. 


If z is a random variable having the standard normal distribution, find the 
probabilities that this random variable will take on a value 

(a) greater than 1.14; (b) less than —0.36; 

(c) between —0.46 and —0.09; (d) between —0.58 and 1.12. 

If x is a random variable having a normal distribution, what are the prob- 
abilities of getting a value 

(a) within one standard deviation of the mean; 

(b) within two standard deviations of the mean; 

(c) within three standard deviations of the mean; 

(d) within four standard deviations of the mean? 


If z, is defined by 


| п(2;0,1) dz = а 


ža 


find its values for 

(a) a = 0.05; (b) a = 0.025; 

(c) а = 0.01; (d) а = 0.005. 

Suppose that during periods of transcendental meditation the reduction of 
a person's oxygen consumption is a random variable having a normal distribu- 
tion with ш = 37.6 cc per minute and o = 4.6 cc per minute. Find the prob- 
abilities that during a period of transcendental meditation a person's oxygen 
consumption will be reduced by 

(a) atleast 44.5 cc per minute; 

(b) at most 35.0 cc per minute; 

(c) anywhere from 30.0 to 40.0 cc per minute. 
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19. 


Suppose that the actual amount of instant coffee which a filling machine puts 
into ““6-ounce” jars is a random variable having a normal distribution with 

= 0.05 ounce. If only 3 percent of the jars are to contain less than 6 ounces 
of coffee, what must be the mean fill of these jars? 


. Use the normal approximation to find the probability of getting 7 heads and 


7 tails in 14 flips of a balanced coin, and compare the result with the exact 
value (rounded to four decimals) given in Table I. 


21. If 23 percent of all patients with high blood pressure have bad side effects 
from a certain kind of medicine, use the normal approximation to find the 
probability that among 120 patients with high blood pressure treated with 
this medicine more than 32 will have bad side effects. 

22. To illustrate the law of large numbers (see also Exercise 20 on page 186), 
use the normal approximation to the binomial distribution to find the prob- 
abilities that the proportion of heads will be anywhere from 0.49 to 0.51 
when a balanced coin is flipped 
(a) 100 times; (b) 1,000 times; (c) 10,000 times. 

23. The center of a target is taken as the origin of a rectangular system of 
coordinates, with reference to which the point of impact of a missile has the 
coordinates x and y. If x and y have a bivariate normal density with д, = 0, 
и = 0, оу = 120feet, o, = 120 feet, and p = 0, find the probabilities that 
the point of impact will be 
(a) inside a square with sides of 180 feet, whose center is at the origin and 

whose sides are parallel to the coordinate axes; 
(b) inside a circle with a radius of 75 feet with its center at the origin. 

24. If x and y have the circular normal distribution with ду = 4; = 0 and 
7; = 0; = 12, find 
(a) the probability of getting a point (x, y) inside the circle x? + y? = 36; 
(b) the value of c for which the probability of getting a point (x, y) inside 

the circle x? + y? = c? is 0.80, 

25. Suppose that x and y, the height and weight of certain animals, have а 
bivariate normal distribution with Hı = 18 inches, и, = 15 pounds, v; = 3 
inches, с, = 2 pounds, and p = 0.75. Find 
(a) the expected weight of one of these animals that is 17 inches tall; 

(b) the expected height of one of these animals that weighs 20 pounds. 
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of Random Variables 


71 INTRODUCTION 


In this chapter we shall concern ourselves with the problem of finding the 
probability distributions or densities of functions of one or more random variables. 
That is, given a set of random variables x,, X», . . . , Xn, and their joint distribution 
or density, we shall be interested in finding the probability distribution or density 
of some random variable y = u(X;, X5,...,X,). This means that the values of 
the random variable y are related to those of the x's by means of the equation 
y U(X, X2, ..., X.). 

Several methods are available for solving this kind of problem. The ones 
we shall discuss іп the next four sections are called the distribution function 
technique, the transformation technique, and the moment-generating function tech- 
nique. Although all three methods can be used in some situations, in most problems 
one technique will be preferable (easier to use than the others). This is true, for 
example, in some instances where the function in question is linear in the random 
variables x,,X2,...,X,, and the moment-generating function technique yields 
the simplest derivations. 

The various techniques we shall discuss in this chapter will be used again 
in Chapter 8 to derive several distributions that are of fundamental importan“ 
in statistical inference. 
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72 DISTRIBUTION FUNCTION TECHNIQUE 


A straightforward method of obtaining the probability density of a function of 
continuous random variables consists of first finding its distribution function, 
and then its density by differentiation. Thus, if x1, X5, ..., X, are continuous 
random variables with a given joint probability density, the probability density 
of y = u(Xi, Xo, ..., X,) is obtained by first determining an expression for the 
probability 


F(y) = Ply € y) = P[u6G, X, ...,X4) S у] 


and then differentiating to get 


dF(y) 


SON ie TUE 


according to Theorem 3.6. 


EXAMPLE 7.1 
If the probability density of x is given by 


Ki 6x(1 — x) for0<x<1 
f(x) = 0 elsewhere 


find the probability density of y = x°. 


Solution 


Letting G( y) denote the value of the distribution function of y at y, we can 
write 


ll 


G(y) = P(y < y) 
Р(х? < y) 


P(x < y^) 


y 
| 6x(1 — x) dx 
о 


= 3y”? — 2y 
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and, hence, 
gy) = 2(y^? - 1) 


for 0 < y < 1; elsewhere, g( y) = 0. In Exercise 5 on page 262 the reader 
will be asked to verify this result by a different technique. A 


EXAMPLE 7.2 
If y = |x|, show that 


f()*f(-y)  foy»0 
0 elsewhere 


g(y) -{ 


where f(x) is the value of the probability density of x atx and g( y) is the value 
of the probability density of y at y. Also, use this result to find the probability 
density of y = |x|, where x has the standard normal distribution. 


Solution 


For y > 0 we have 


G(y) = Ply < y) 
= P(lx| < у) 
= P(-y <x<y) 
= F(y) - Е(—у) 


and, upon differentiation, 


8(у) = f(y) + f(-y) 


Since |x| cannot be negative, g(y) = 0 for y — 0; the value of g(0) is 
arbitrarily set equal to 0. If x has the standard normal distribution and 
у = |х|, it follows that 


g(y) = n(y;0, 1) + п(-у; 0, 1) 
= 2n(y; 0,1) 


for y > 0; elsewhere, g( y) = 0. An important application of this result is 
given in Example 7.9. A 
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EXAMPLE 7.3 
If the joint density of x, and x, is given by 


беи» for x10, 5 00 
0 elsewhere 


Хх, х) = | 


find the density function of the random variable у = x; + x. 


Solution 


Integrating the joint density over the shaded region of Figure 7.1, we get 


= 
6e ?*7?* dx, dx; 


ll 
A, 
— 

1 


F(y) 


о 


21426? – 3e? 


х2 


xy +x2 =y 


x 


Figure 7.1 Diagram for Example 73. 


and, differentiating with respect to y, we obtain 
f(y) = (e^ - e) 


for y > 0; elsewhere, f(y) = 0. А 
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. If the joint density of x and y is given by 


THEORETICAL EXERCISES 
1. If the probability density of x is given by 


2xe^* forx > 0 
debe ё elsewhere 
and y = x’, find 
(a) the distribution function of y; 

(b) the probability density of y. 


. If x has an exponential distribution with the parameter 6, use the distribution 


function technique to find the probability density of the random variable 
y = Inx. 


| 
. If x has the uniform density with the parameters а = 0 and В = 1, use the | 


method of Section 7.2 to find the probability density of the random variable _ 
y = vx. i 


=(xt+y2 4 
4хуе “7 3 forx > 0,y > 0 


0 elsewhere 


f(x, у) = { 


and z = Vx? + y’, find 
(a) the distribution function of z; 
(b) the probability density of z. 


. If x, and x; are independent random variables having exponential densities 


with the parameters 0, and 6;, use the method of Section 7.2 to find the 
probability density of y = x, + x; when 

(a) 0, = 65 

(b) 6; = 6. 

Note that Example 7.3 is a special case of part (a) with 0, = } and 0, = > 


- If x, and x; are independent random variables having the uniform density 


with a = Qand B = 1, referto Figure 7.2 to find expressions for the distribution 
function of y = x, + x; which apply when 

(а) y <0; (b 0<у<1; 

(o) tos y» «2: (d y22 

Also find the probability density of y. 


- If the joint density of x and y is given by 


e Wo for x > 0, y > 0 


0 elsewhere 


f(x, y) ={ 


xt 
andz = mn find the probability density of z by the method of Section 7.2. 


— es 
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22 x2 


у=хр+хә 22 1<у=ху+х;<2 


х1 Erden шык 


х2 X2 


О<у=ху+хә <1 yzx1*x2«€0 


xi — L X1 


Figure 7.2 Diagram for Exercise 6. 


APPLIED EXERCISES 


In Exercise 26 on page 117, p is the price of a certain commodity (in dollars) 
and s its total sales (in 10,000 units). Use the joint density given in that exercise 
and the method of Section 7.2 to find the probability density of the random 
variable v — sp, the total amount of money spent on this commodity in units 
of $10,000. 

In Exercise 19 on page 132, x is the amount (in dollars) a salesperson spends 
on gasoline and y is the amount (in dollars) for which he/she is reimbursed. 
Use the joint density given in that exercise and the method of Section 7.2 to 
find the probability density of the random variable z — x — y, the amount for 
which he/she is not reimbursed. 
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73 TRANSFORMATION TECHNIQUE: 
ONE VARIABLE 


Let us show how the probability distribution or density of a function of a 
variable can be determined without first getting its distribution function. T 
discrete case there is no real problem so long as the relationship betw 
values of x and y — u(x) is one-to-one; all we have to do is make the apj 
substitution. 


EXAMPLE 7.4 


If x is the number of heads obtained in four tosses of a balanced coin, | 


bability distributi fy- E 
probability distribution of y WX 


Solution 


Using the formula for the binomial distribution with n = 4 and 8 =; 
referring to the result of Example 3.3 on page 79), we find that the probal 
distribution of x is given by 


Et tunm S: > 
Jo |e 6 65 i k 


A : А 1 Е 
Then, using the relationship у = 1 to substitute values of y for vi 


tx 
of x, we find that the probability distribution of y is given by 


MAR 
8) feo) ee M M 


If we had wanted to make the substitution directly in the formul; 
the binomial distribution with n = 4 and @ = 1, we could have substi 


1 
x--—-lforxin 
y 


Дх) = (OEY for x = 0, 1,2,3,4 
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getting 


1 1\* 
so) = (2-1) =|{1_, () fory=1,3,3,45 А 


Note that in Example 7.4 the probabilities remained unchanged; the only 
difference is that in the result they are associated with the various values of y 
instead of the corresponding values of x. This is all there is to the transformation 
(or change of variable) technique in the discrete case so long as the relationship 
is one-to-one. If it is not one-to-one, we may proceed as in the following example: 


EXAMPLE 7.5 


With reference to Example 7.4, find the probability distribution of the random 
variable z = (x — 2)’. 


Solution 


Calculating the probabilities h(z) associated with the various values of z, 


we get 
h(0) = f(2) = % 
h(1) = Л) + f(3) = % + is = 16 
h(4) = f(0) + /(4) = is + 16 = 16 
and, hence, 


z Оу 8E 
hz) ae n iA 


To perform a transformation of variable in the continuous case, we shall 
assume that the function given by y — u(x) is differentiable and either increasing 
or decreasing for all values within the range of x for which f(x) # 0, so that the 
inverse function, given by x = w( y), exists for all of the corresponding values 
of y, and is differentiable except where u'(x) = 0.' Under these conditions, we 
can prove the following theorem: 


* Note that, to avoid points where u'(x) might be 0, we generally did not include 
the endpoints of the intervals for which probability densities are non-zero. This is a practice 
we shall follow throughout this book. 
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THEOREM 7.1 Let f(x) be the value of the probability. density of the 
continuous random variable x at x. If the function given by y = u(x) is’ 
differentiable and either increasing or decreasing for all values within the 
range of x for which f(x) # 0, then, for these values of x, the equation 
y = u(x) can be uniquely solved for x to give x = w( y), and for the 
corresponding values of y the probability density of y = u(x) is given by 


g(y) = fiw(y)] - |w'(y)| provided и(х) #0 


Elsewhere, g( y) — 0. 


Proof. First let us prove the case where the function given by у = 
u(x) is increasing. As can be seen from Figure 7,3, x must take on a value 
between w(a) and w(b) when y takes on a value between a and b. Hence, 


P(a < y < b) = P[w(a) <x < w(b)] 


w(b) 
= | f(x) dx 


м(а) 
b 
= | fl w(y)]w'(y) dy 


where we performed the change of variable y = u(x), or equivalently 
X = w(y), in the integral. In accordance with Definition 3.4, the integrand 
gives the probability density of y so long as w'( y) exists, and we can write 


gy) = flw(y)]w'(y) 

When the function given by y — u(x) is decreasing, it can be seen 
from Figure 7.3 that x must take on a value between w(b) and w(a) when 
y takes on a value between a and b. Hence, 

P(a < y < b) = P[w(b) <x < w(a)] 
wia) 
= | f(x) dx 
w(b) 


3 f fLw(y)]w'(y) dy 


b 
-Í fLw(y)]w'(y) dy 
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у =и(х) 


| 

| 

| 

| 

| 

| 

| 
КЫА! х 
wla) w (b) 


Increasing function 


wib) wla) 


Decreasing function 


Figure 7.3 Diagrams for proof of Theorem yak 


where we performed the same change of variable as before, and it follows 
that 


g() = -fIwG)]w'Cy) 


4 1 
Since w'(y) = т = A is positive when the function given by y = u(x) is 


dx 
increasing, and —w'( y) is positive when the function given by y — u(x) is 
decreasing, we can combine the two cases by writing 


go) = ftw») wol + 
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EXAMPLE 7.6 
If x has the exponential distribution given by 


er forx > 0 
Јо) = i elsewhere 
find the probability density of the random variable у = ух. 


Solution 
The equation у = vx, relating the values of x and y, has the unique inverse 


dx 
x = у?, which yields w'(y) = D = 2y. Therefore, 


g(y) = е” [2у| = 2ye” 


for y > 0 in accordance with Theorem 7.1. Since the probability of getting 
a value of y less than or equal to 0, like the probability of getting a value 
of x less than or equal to 0, is 0, it follows that the probability density of 
y is given by 


E EN for y >0 
ET 0 elsewhere 
Note that this is the Weibull distribution of Exercise 15 on page 218 with 
a = 1 and B = 2. A 


The two diagrams of Figure 7.4 illustrate what happens in Example 7.6 
when we transform from x to y. As in the discrete case (for instance, Example 
7.4), probabilities remain the same, but they pertain to different values (intervals 
of values) of the respective random variables. In the diagram on the left, the 0.35 
probability pertains to {һе event that x will take on a value on the interval from 
1 to 4, and in the diagram on the right, the 0.35 probability pertains to the event 
that y will take on a value on the interval from 1 to 2. 


EXAMPLE 7.7 


If the double arrow of Figure 7.5 is spun so that the random variable Ө has the 
uniform density 


1 for-2<9 <2 
0) = {т 2 2 
0 


elsewhere 


| 
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1.0 
0.9 
0.8 
07 
0.6 
0.5 
0.4 
03 gly) = 2ye-Y? 
02 
0.1 


Figure 7.4 Diagrams for Example 7.6. 


determine the probability density of x, the abscissa of the point on the x-axis to 
which the arrow will point. 


Solution 


As is apparent from the diagram, the relationship between x and 6 is given 
by x = a: tan 6, so that 


do _ а 
ах а?+х? 


х =а.їап@ DS 


0 


Figure 7.5 Diagram for Example 7.7. 
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and it follows that 


1 a 
х)=—- 
a(x) т |а + х? 
zA 2 for -œ <x < o 
т а + х? 


according to Theorem 7.1. Note that this is a special case of the Cauchy 
distribution of Exercise 3 on page 217. A 


EXAMPLE 7.8 


If F(x) is the value of the distribution function of the continuous random variable 
x at x, find the probability density of y = Р(х). 


Solution 


As can be seen from Figure 7.6, the value of y corresponding to any particular 
value of x is given by the area under the graph of the density of x to the 
left of x. Differentiating y — F(x) with respect to x, we get 


E= F'G) = Л) 


Figure 7.6 Diagram for Example 7.8 (Probability integral transformation). 


and, hence, 
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provided f(x) # 0. It follows from Theorem 7.1 that 


FEN 


fm! 


g(y) = f(x) - 


for 0 < y < 1, and we can say that y has the uniform density with a = 0 
and В = 1. A 


The transformation which we performed in Example 7.8 is called the 
probability integral transformation. The result is not only of theoretical impor- 
tance, but it facilitates the simulation of observed values of continuous random 
variables. A reference to how this is done, especially in connection with the 
normal distribution, is given on page 270. 

When the conditions underlying Theorem 7.1 are not met, we can be in 
serious difficulties, and we may have to use the method of Section 7.2 or a 
generalization of Theorem 7.1 referred to among the references on page 270; 
sometimes, there is an easy way out, as in the following example: 


EXAMPLE 7.9 


If x has the standard normal distribution, find the probability density of z — xt, 


Solution 


Since the function given by z = x^ is decreasing for negative values of x 
and increasing for positive values of x, the conditions of Theorem 7.1 are 
not met. However, the transformation from x to z can be made in two steps: 
First we find the probability density of y = |x|, and then we find the 
probability density of z = y"(=x’). 

So far as the first step is concerned, we already studied the transforma- 
tion y = |x| in Example 7.2 on page 242; in fact, we showed there that if 
x has the standard normal distribution, then y = |x| has the probability 


density 


-Ь? 


2 
g(y) = 2п(у;0,1) = ame 


for y > 0, and g(y) = 0 elsewhere. For the second step, the function given 
by 2 = у> is increasing for y > 0; that is, for all values of y for which 
g(y) # 0, Thus, we can use Theorem 7.1, and since 


Soy 


dz 
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we get 


Ka) = ы 
2T 


for z > 0, and h(z) = 0 elsewhere. Observe that since Г(!) = Vz, the 
distribution we have arrived at for z is a chi-square distribution (see 
Definition 6.4 on page 213) with v — 1. A 


74 TRANSFORMATION TECHNIQUE: 
TWO VARIABLES 


The method of the preceding section can also be used to find the distribution of 
a random variable which is a function of two or more random variables. Suppose, 
for instance, that we are given the joint distribution of two random variables x, 
and x;, and that we want to determine the distribution of the random variable 
y = u(x, хз). If the relationship between y and X, With x; held constant, or the 
relationship between y and x; with x, held constant, permits, we can proceed in 
the discrete case as in Example 7.4 to find the joint distribution of y and x;, or 
x, and y, and then sum on the values of the other random variable to get the 
marginal distribution of y. In the continuous case, we first use Theorem 7.1 with 
the transformation formula written as 


I OKT 
g(y, X2) = f(x1, x2) ay 
or as 
9х, 
5 g(x, y) = №, x) - n 


where f(x,, x;) and the partial derivative must be expressed in terms of y and 
X2, or X; and y. Then we integrate out the other variable to get the marginal 
density of y. 
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EXAMPLE 7.10 


If x, and x; are independent random variables having Poisson distributions with 
the parameters A, and A2, find the probability distribution of the random variable 
y=x,+Xx. 


Solution 


Since x, and x, are independent, their joint distribution is given by 


eA) | e (A3) 


‚х) = 
fn, X2) xil zm 
едт) (Аз) 
A x,!xj! 
for x, = 0, 1, 2,..., and x; = 0, 1, 2,.... Since y = x, + x; and, hence, 


X, = y — x», we can substitute y — x; for x,, getting 


e OI») (л) 


ur air pq 


for y = 0,1,2,..., and x; = 0,1,..., y, for the joint distribution of y and 
X2. Then, summing оп x, from 0 to y, we get 


x e OU (ASQ 


ру 
(у) P X2!(y — x2)! 
Debe > y! = 
ey TA os 
W „Бозау А) 


after factoring out e "*^" and multiplying and dividing by y!. Identifying 
the summation at which we arrived as the binomial expansion of (A, + A)”, 
we finally get 


e OI», + Аз)? 


; for у = 0, 1, 2,... 
y! 


h(y) = 


and we have, thus, shown that the sum of two independent random variables 
having Poisson distributions with the parameters A, and A, has a Poisson 
distribution with the parameter A = A, + A5. A 


256 Chap. 7: Functions of Random Variables 
EXAMPLE 7.11 


If the joint density of x, and x; is given by 


—(x,+x,) 
eee? for x, > 0,x; > 0 
Sl, ә) = f elsewhere 
^ : X; 
find the density function of y — Т 
х t X; 


Solution 


Since y decreases when x; increases and x, is held constant, we can use 
Theorem 7.1 (modified as indicated on page 254) to find the joint density 


; 4 n 
of x, and y. Since y = x E yields x; = x, - У and, hence, 
1 2 
юу 
ду y 


it follows that 


g(x,y) = e |- 


for x, > 0 and 0 « y « 1, Finally, integrating out x, and changing the 
variable of integration to u = x,/y, we get 


h(y) = | AL. елу dy, 
y 


о 


-Í u: e “du 
0 


= Г(2) 
= 1 


for 0 < y < 1, and h( y) = 0 elsewhere. Thus, the random variable y has 
the uniform density with a = 0 and B = 1. A 


The preceding example could also have been worked by a general method 
where we begin with the joint distribution of two random variables x, and X; 
and determine the joint distribution of two new random variables y, = u;(x;, X2) 
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and у, = u;(x,, X;). Then we can find the marginal distribution of y; or y; by 
summation or integration. 

This method is used mainly in the continuous case, where we need the 
following theorem, which is a generalization of Theorem 7.1: 


THEOREM 72 Let f(x, , х) be the value of the joint probability density of 
the continuous random variables x, and x; at (ху, x;). If the functions given 
by y, = uj(x,,x;) and y; = u;(x,,x;) are partially differentiable with 
respect to both x, and x, and represent a one-to-one transformation for all 
values within the range of x, and x; for which f(x,, x2) # 0, then for these 
values of x, and xz, the equations y, = u,(x,,X2) and y; = u(x, X2) can 
be uniquely solved for x, and x; to give xy = wı( yı, Y2) and x;  wi(yi, y) 
and for the corresponding values of y, and y; the joint probability density 
of y, = u(x; X2) and y; = u;(Xi, X2) is given by 


gi. у») = fbr уз), wn У»)] + Ul 
Here, J, called the Jacobian of the transformation, is the determinant 


Әх x 


ду, 
ах 
ду 


ду› 
ах 
ду» 


Elsewhere, #( у, y2) = 0. 


We shall not prove this theorem, but information about Jacobians and their 
applications can be found in most textbooks on advanced calculus. There they 
are used mainly in connection with multiple integrals, say, when we want to 
change from rectangular coordinates to polar coordinates or from rectangular 
coordinates to spherical coordinates. 


EXAMPLE 7.12 


With reference to the random variables x, and x; of Example 7.11, find 
iid) 
x +X,” 


(a) the joint density of y; = Xi + X2 and y; = 


(b) the marginal density of y2. 
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Solution 


» 


(a) Solving y, = x, + x; and y; = БИр for x, and x;, we get х = уу; 
1 2 


and x; = y,(1 — уз), and it follows that 


У у 


= -у 
1-» - | 


2 - | 


Since the transformation is one-to-one, mapping the region x, > 0 and 
x; > 0 in the x,x,-plane into the region y, > 0 and 0 < у; < 1 in the 
Yiyz-plane, we can use Theorem 7.2 and it follows that 


801.» = e: |-y|| = ie 
for y, > 0 and 0 < y, < 0; elsewhere, g(y;, y2) = 0. 


(b) Using the joint density obtained in part (a) and integrating out y,, we 
get 


h(y2) = | g(y1, X2) ду, 


0 


= Г(2) 
= 1 
for 0 < y; < 1; elsewhere, h(y;) = 0. Note that this result agrees with 


that obtained on page 256. A 


EXAMPLE 7.13 
If the joint density of x, and x; is given by 


1 for0<x,<1,0<x,<1 
0 elsewhere 


Хх, ж) = | 


find 
(a) the joint density of y = x, + xX and z = x); 
(b) the marginal density of y.' 


‘In Exercise 6 on page 244 the reader was asked to work the same problem by the 
method of Section 7.2. 


Solution 


(a) 


Figure 7.7 
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Solving y = x, + x; and z = x; for x, and х, we get xy = y — г and 
X = z, so that 
1 <1 
J- = 1 
Ж 


Since the transformation is one-to-one, mapping the region 0 < x, < 1 
and 0 < x, < 1 in the x,x,-plane into the region z < y < z + 1 and 
0 < 2 <1 in the yz-plane (see Figure 7.7), we can use Theorem 7.2 
and we get 


8»221-hl- 


for z < y <z+1and 0 < z < 1; elsewhere, g( y, 2) = 0. 


Transformed sample space for Example 7.13. 


(b) Integrating out z separately for y < 0, 0< y <1, 1< y < 2, and 


y > 2, we get 
0 fory <0 


ldz=y for0<y<1 


© 


ldz=2-y (ог1<у<2 


Magus I 
f 


0 foryz2 
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and to make the density function continuous, we let h(1) = 1. We have 
thus shown that the sum of the given random variables has the triangular 
probability density, whose graph is shown in Figure 7.8. A 


nly) 


Aly)" 2—y 


Figure 7.8 Triangular probability density. 


So far we have considered only functions of one or two random variables, 
but the method based on Theorem 7.2 can easily be generalized to functions of 
theee or more random variables, For instance, if we are given the joint probability 
density of three random variables x,, х;, and x, and we want to find the 
joint probability density of the random variables y, = u,(X;, X;, X), у; = 
и(х\, X2, X3), and y, = и,(х,, X;, хз), the general approach is the same, but the 
Jacobian is now the 3 x 3 determinant 


ax, PA dn 
дуу ду; ду, 
j.|P m аю 
ду, ду, ду, 
ду. ду; ду, 


Once we have determined the joint probability density of the three new random 
variables, we can find the marginal density of any two of the random variables, 
or any one, by integration. 


EXAMPLE 7.14 
If the joint probability density of x,, X;, and x; is given by 


Er 
f(x, x, Xx,) = F | for-x; > 0, x2 > 0, ху > 0 
elsewhere 
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find 
(a) the joint density of y, = x, + X; + Xj, y; = X;, and y, = X; 
(b) the marginal density of y,. 


Solution 
(a) Solving the system of equations y, = x, + X; + X3, y; = Xz, and y, = 
Xa for ху, X2, and xy, we get xy = y, = у; — уу, X; = yy, and x, = yy. 
It follows that 


and, since the transformation is one-to-one, that 


8 323) = е”. [1 


= е7). 


for у; > 0, y, > 0, and y, > у; + уз; elsewhere, g( yi, Yz, Y3) = 0. 
(b) Integrating out y; and ys, we get 


LU PMN 
һу) = | | e dy; dy, 
0 о 
= дуе" 


for y, > 0; А( yı) = 0 elsewhere. Observe that we have shown that the 
sum of three independent random variables having the gamma distribu- 
tion with a = 1 and @ = 1 is a random variable having the gamma 


distribution with a = 3 and B = 1. A 


As the reader will find in Exercise 25 below, it would have been easier to obtain 
the result for part (b) of the preceding example by using the method based on 
Theorem 7.1 as modified on page 254. 


THEORETICAL EXERCISES 

1. If x has a geometric distribution with 0 = 1, find the probability distribution 
of y = 4 — 5х. 

2. If x has a hypergeometric distribution with К = 4, N = 15, and n = 3, find 
the probability distribution of y, the number of successes minus the number 
of failures. 
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3. 


omna 


ч 


10. 


If xis thetotal we roll with a pair of dice, for which the probability distribution 
is given on page 78, find the probability distribution of the remainder we 
get when x is divided by 3. 


Use the method of Section 7.3 to prove Theorem 6.7 on page 225. 


. Use the transformation technique to rework Example 7.1 on page 241. 
. If x = In y has a normal distribution with the mean и and the variance o°, | 


find the probability density of y, which is said to have the log-normal 
distribution. 


. If the probability density of x is given by 


x 

= for0<x <2 
f(x) = 42 

0 elsewhere 


find the probability density of y = x^. Also plot the graphs of these two 
probability densities and indicate the respective areas under the curves 
representing P(} < x < 1) and P} < y < 1). 


. If the probability density of x is given by 


kx? 
Y. U+ 2xy forx > 0 
0 elsewhere 


where k is an е constant, find the probability density of the random 


variable y — - Also find the value of k by comparing the result with — 


1+2х + is 
the probability density of Definition 6.5. 


. If x has a uniform density with a = 0 and B = 1, show that the random 


variable y = —2 · In x has a gamma distribution. What are its parameters? 
If the probability density of x is given by 


ЖЫЙ 12 or-1«x«1 
0 elsewhere 
find 
(a) the probability density of y — |х| using the result of Example 7.2 on 
page 242; 


(b) the probability density of z = x'(-y 2m 


11. 


12. 


13. 


14. 


15. 


16. 


17. 


18. 


19. 
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If x has a uniform density with a = —1 and 8 = 3, find 

(a) the probability density of y = |x| using the result of Example 7.2 on 
page 242; 

(b) the probability density of z = x*(=y’). 

If the joint probability distribution of x, and x; is given by f(x,, x2) = i 


for x, = 1, 2, 3, and x; = 1, 2, 3, find 
(a) the probability distribution of y = xix; 
(b) the probability distribution of z = =. 

х: 
If the joint probability distribution of x, and x; is given by 

emia: 
Хх, х) = 18 

for x, = 1, 2, and x; = 1, 2, 3, find 
(a) the joint probability distribution of y, = x; + X» and y; = X — X5; 
(b) the marginal distribution of yı. 
If x, , X2, and x; have the multinomial distribution (see page 203) with n = 2, 
8, = 1, 6; = 1, and Ө, = 4, find the joint probability distribution of y, = 
X; + X2, Y2 = Xi — X2, and y; = Xs. 


With reference to Example 3.12 on page 103, find 

(a) the probability distribution of u = x + y; 

(b) the probability distribution of v = xy; 

(c) the probability distribution of w — x — y. 

If x, and x; are independent random variables having binomial distributions 
with the respective parameters n, and б and n; and 0, show that y = x, + X; 
has the binomial distribution with the parameters n; + nz and 6. (Hint: Use 
Theorem 1.12.) 

If x, and x; are independent random variables having the geometric distribu- 
tion with the parameter 6, show that y = X; + X; is a random variable having 
the negative binomial distribution with the parameters 6 and К = 2. 

If x and y are independent random variables having the standard normal 
distribution, show that z = x + y also has a normal distribution. (Hint: 
Complete the square in the exponent.) What are the mean and the variance 
of this normal distribution? 


If the joint density of x and y is given by 


_ fl2xy(1 7 у) for0<x<1,0<y<1 
(sy) = [i elsewhere 


find the probability density of the random variable z = xy". 


264 


Chap.. 7: Functions of Random Variables 


20. If x, and x; are independent random variables having the Cauchy distribution 


21. 


23. 


24. 


1 


——— T for -© < x < © 
ТЕП, ИНША. 


f(x) = 


find, and identify, the probability density of y = x, + x;. (Hint: Use partial 
fractions to perform the necessary integration.) 


If x and y are two random variables whose joint density is given by 


j^ forx>O0,y>Oxt+y <2 
20. E , 
Ду) = l elsewhere 


find the probability density of u = y — x. 
Let x, and x; be two continuous random variables having the joint density 


4xiX3 for0<x,<1,0<x,<1 
0 elsewhere 


f(x, х) | 


Find the joint density of y, = х? and y; = х,х,. 
If the joint density of the random variables x and y is given by 


24x (ог0<х<1,0<у<],х+у<1 
Гоу) = | к T j 
0 elsewhere 


find the joint density of z = x + y and w = x. 

Let x and y be two independent random variables having identical gamma 

distributions. 

(a) Find the joint density of the random variables и = x/(x + y) and 
у=х+у. 

(b) Show that the marginal density of и is a beta distribution. 


. Rework Example 7.14 without using the Jacobian technique. 
26. 


In Example 7.13 we found the probability density of the sum of two indepen- 
dent random variables having the uniform density with a = 0 and B = 1. 
Given a third random variable х; which has the same uniform density and 
is independent of x, and xz, show that if u = y + x, = x, + x; + x;, then 
(a) the joint density of и and y is given by 


[у for Regions I and II of Figure 7.9 
g(4y)=42-y Гог Regions Ш and IV of Figure 7.9 
0 elsewhere 
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Figure 7.9 Diagram for Exercise 26. 


(b) the probability density of u is given by 


0 foru <0 
ш? for0<u<1 
h(u) 24h? - Xu – 1)? forl<u<2 


3 

2 
w-Xu-1?3u-2? for2<u<3 
foruz3 


кие 


© 


Note that if we let h(1) = h(2) = 2, this will make the probability 
density of u continuous. 


APPLIED EXERCISES 


27. According to the Maxwell-Boltzmann law of theoretical physics, the probabil- 
ity density of v, the velocity of a gas molecule, is 


kve” — forv > 0 
f(r) = |, elsehwere 


where B depends on its mass and the absolute temperature, and k is an 
appropriate constant. Show that the probability density of the kinetic energy 
E, whose values are related to those of v by means of the equation E = imo’, 
is a gamma distribution. 
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28. With reference to Exercise 25 on page 117, find the probability density of 
+ n 
z= Ta the average proportion of correct answers a student will get on 
the two aptitude tests. 
29. Use the Jacobian technique to rework Exercise 8 on page 245, by determining 
first the joint density of the random variables v = sp and w = p, and then 
the marginal density of v. 


75 MOMENT-GENERATING FUNCTION TECHNIQUE 


Moment-generating functions can play an important role in determining the 
probability distribution or density of a function of random variables when the 
function is a linear combination of n independent random variables. We shall 
illustrate this technique here when such a linear combination is, in fact, the sum 
of n independent random variables, leaving it to the reader to generalize it in 
Exercises 5 and 6 on page 269. 

The method is based on the theorem that the moment-generating function 
of the sum of n independent random variables equals the product of their 
moment-generating functions, namely, 


THEOREM 73 If x,,X2,..., and x, are independent random variables and 
у= х + жх +... + Xn then 


м) = П M(t) 


where М, (1) is the value of the moment-generating function of x, at t. 


Proof. Making use of the fact that the random variables are indepen- 
dent and, hence, 


S (X15 Xare ee s Xn) = Ala) Б) *..., f 00) 
according to Definition 3.14, we can write 


M(t) = Е(е") = Blears 


= 2, 
| ee [| оО Xe о) Мх dx; dx, 


=o 


= f e^'f (xi) dx, - | ех (х) dxa +++ fe e^ f, (x4) dx, 


= T] M.) 


ї=1 
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which proves the theorem for the continuous case. To prove it for the 
discrete case, we have only to replace all of the integrals by sums. M 


Note that if we want to use Theorem 7.3 to find the probability distribution 
or density of the random variable y = x, + Xo +... + x,, we must be able to 
identify whatever probability distribution or density corresponds to M,(t) and 
rely on the first of the two theorems which we gave on page 229; namely, the 
uniqueness theorem about the correspondence between moment-generating func- 
tions and probability distributions or densities. 


EXAMPLE 7.15 


Find the probability distribution of the sum of n independent random variables 
X;, Xo,---, and Xn, having Poisson distributions with the respective parameters 
Лу, А2, ee ts Àn: 


Solution 
By Theorem 5.9, we have 
M,,(t) = еме) 


and, hence, with y = X; + Xo +‘: + Xn, We obtain 


My) = П ембе) = grt) 

і=1 
which can readily be identified as the moment-generating function of the 
Poisson distribution with the parameter А = Ay + Ag+ +++ + А„. Thus, 
the distribution of the sum of n independent random variables having 
Poisson distributions with the parameters A; is a Poisson distribution with 
the parameter A = Л, + Ap + + An. Note that in Example 7.10 we 
proved this for n = 2. A 


EXAMPLE 7.16 


If ху, X;,..., and x, are independent random variables having exponential 
distributions with the same parameter 6, find the probability density of the random 
variable y = Xi t Xt *'* E Xy 


Solution 


Since the exponential distribution is a gamma distribution with a = 1 and 
B = 0, we have 


M,(t) = (1- 607 
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by Theorem 6.4, and, hence, 


n 


M,(t) -I[(1—-68)"2( - en" 


i=l 

according to the second of the special rules for products in Appendix II. 
Identifying the moment-generating function of y as that of a gamma distribu- 
tion with a = n and B = 6, we conclude that the distribution of the sum 
of n independent random variables having exponential distributions with 
the same parameter 0 is a gamma distribution with the parameters a = n 
and B — 6. Note that this agrees with the result of Example 7.14, where 
we showed that the sum of three independent random variables having 
exponential distributions with the parameter 0 = 1 has a gamma distribution 
with a = 3 and B = 1. A 


Theorem 7.3 also provides an easy and elegant way of deriving the moment- 
generating function of the binomial distribution. Suppose that x,, X;,..., and 
Xn, are independent random variables having the same Bernoulli distribution 
f(x; Ө) = 0*(1 — 0)'7* for x = 0,1. By Definition 4.6, 


M, (t) = e*'(1— 0) + е! ' = 1 + O(e! – 1) 


so that Theorem 7.3 yields 


M(t) = TL + oe - 0] = [1 + б(е'—1)]" 


i=l 


which is readily identified as the moment-generating function of the binomial 
distribution with the parameters n and 6. Of course, y = X +x, t scio x, is 
the total number of successes in n trials, since X; is the number of successes on 
the first trial, x; is the number of successes on the second trial, ..., and x, is the 
number of successes on the nth trial. As we shall see later, this is a fruitful way 
of looking at the binomial distribution. 


THEORETICAL EXERCISES 


1. Use the moment-generating function technique to rework Exercise 16 on 
page 263. 


2. Find the moment-generating function of the negative binomial distribution 
by making use of the fact that if k independent random variables have 
geometric distributions with the same parameter Ө, their sum is a random 


variable having the negative binomial distribution with the parameters 0 and 
k. (Hint: Use the result of Exercise 4 on page 198.) 
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3. If n independent random variables have the same gamma distribution with 
the parameters о and £, find the moment-generating function of their sum 
and, if possible, identify its distribution. 

4. If n independent random variables x, have normal distributions with the 
means ш, and the standard deviations тү, find the moment-generating function 
of their sum and identify the corresponding distribution, its mean, and its 
variance. 


5. Prove the following generalization of Theorem 7.3: If x,, x», ..., and x, are 
independent random variables and y = ах; + ах, + ++ + а,х,, then 


мд) = TI Maud) 
i=l 


where M, (г) is the value of the moment-generating function of x; at t. 


6. Use the result of Exercise 5 to show that if n independent random variables 
x, have normal distributions with the means ш; and the standard deviations 
g;, then y = a,x, + ах, + +++ + a,x, has a normal distribution. What are 
the mean and the standard deviation of this distribution? 


APPLIED EXERCISES 


7. A lawyer has an unlisted number on which she receives on the average 24 
calls every half hour and a listed number on which she receives on the average 
10.9 calls every half hour. If it can be assumed that the number of calls she 
receives on these phones are independent random variables having Poisson 
distributions, what are the probabilities that in half an hour she will receive 
altogether 
(a) 14 calls; 

(b) at most 6 calls? 

8. The number of fish a person catches per hour at Woods Canyon Lake is a 
random variable having the Poisson distribution with A — 1.6. Use the result 
of Example 7.15 to find the probabilities that a person fishing there will catch 
(a) four fish in two hours; 

(b) at least two fish in three hours; 
(c) at most two fish in four hours. 


9. If the number of minutes it takes a service station attendant to balance a tire 
is a random variable having an exponential distribution with the parameter 
Ө = 5, what are the probabilities that the attendant will take 


(a) less than 8 minutes to balance 2 tires; 
(b) atleast 12 minutes to balance 3 tires? 
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10. If the number of minutes a doctor spends with a patient is a random variable 
having an exponential distribution with the parameter 0 = 9, what are the 
probabilities that it will take the doctor at least 20 minutes to treat 
(a) two patients; 

(b) three patients? 
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Sampling Distributions 


81 INTRODUCTION 


Statistics concerns itself mainly with conclusions and predictions resulting from 
chance outcomes that occur in carefully planned experiments or investigations. 
In the finite case, these chance outcomes constitute a subset, or sample, of 
measurements or observations from a larger set of values called the population. 
In the continuous case they are usually values of identically distributed random 
variables, whose distribution we refer to as the population distribution, or the 
infinite population sampled. The word "infinite" implies that there is, logically 
speaking, no limit to the number of random variables whose values we could 
observe. 

All these terms are used here somewhat unconventionally. If a scientist 
must choose and then weigh five of 40 guinea pigs as part of an experiment, a 
layman might say that the ones she selects constitute the sample. This is how the 
term “sample” is used in everyday language. In statistics, it is preferable to look 
upon the weights of the five guinea pigs as a sample from the population which 
consists of the weights of all 40 guinea pigs. In this way, the population as well 
as the sample consists of numbers. Also, suppose that, to estimate the average 
useful life of a certain kind of transistor, an engineer selects ten of these transistors, 
tests them over a period of time, and records for each one the time to failure. If 
these times to failure are values of random variables having the exponential 
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distribution with the parameter 6, we say that they constitute a sample from this 
exponential population. 

As can well be imagined, not all samples lend themselves to valid 
generalizations about the populations from which they came. In fact, most of 
the methods of inference discussed in this book are based on the assumption 
that we are dealing with random samples. In practice, we often deal with random 
samples from populations that are finite, but large enough to be treated as if they 
were infinite. Thus, most statistical theory and most of the methods we shall 
discuss apply to samples from infinite populations, and we shall begin here with 
a definition of random samples from infinite populations. Random samples from 
finite populations will be treated later in Section 8.3. 


DEFINITION 8.1 If xj, X», ..., and x, are independent and identically dis- 
tributed random variables, we say that they constitute a random sample 
from the infinite population given by their common distribution.' 


If f(xi, x3,..., Xn) is the value of the joint distribution of such a set of random 
variables at (x1, X;,..., x4), we can write 


n 


Жол: a) = TD. 


where f(x;) is the value of the population distribution at Xi. Observe that Definition 
8.1 applies also to sampling with replacement from finite populations; sampling 
without replacement from finite populations will be discussed later on page 278. 

Statistical inferences are usually based on statistics, that is, on random 
variables which are functions of a set of random variables Xi, X2,..., and Xi, 
constituting a random sample. Typical of what we mean by "statistic" are the 
sample mean and the sample variance. 


DEFINITION 82 If x,, X,... , and x, constitute a random sample, then 


+ А Я 

In the past, it has been common practice to apply the term “random sample" to 
the values of the random variables instead of the random variables, themselves. Intuitively, 
this makes more sense, but it does not conform with current usage. 
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is called the sample mean and 


is called the sample variance." 


As they are given here, these definitions apply only to random samples, but the 
sample mean and the sample variance can, similarly, be defined for any set of 
random variables X; , Xo, ..., and X,. 

It is common practice to apply the terms "statistic," "sample mean," and 
"sample variance" also to values of the corresponding random variables. For 
instance, to compute the mean and the variance of a set of observed data, we 


substitute into the formulas 


p» Xi x (x; -— xy 
х= апа 52 = fL ——— 
n n-1 


where the x, denote observed values of the corresponding random variables. Such 
values of x and 52 are often used to estimate the mean и and the variance а? of 
the population from which the data were obtained. It must be understood, of 
course, that we have introduced x and Ss? as examples of statistics, and that there 
are many other statistics which can be used to estimate и, o^, and other population 
parameters. 

Since statistics are random variables, their values will vary from sample to 
sample, and it is customary to refer to their distributions as sampling distributions. 
Most of the remainder of this chapter will be devoted to the sampling distributions 
of statistics which play important roles in applications. 


THE DISTRIBUTION OF THE MEAN 


First let us study some theory about the sampling distribution of the mean, making 
only some very general assumptions about the nature of the population sampled. 


* The reason for dividing by n — 1 rather than the seemingly more logical choice, 
n, will be explained in Section 10.3. 
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THEOREM 8.1 If x,, X;,..., and x, constitute a random sample from an 
infinite population which has the mean џ and the variance c^, then 
а? 


E(x) = и and var(x) = Е 


Proof. Letting у = x in Theorem 4.14 and, hence, setting а, = 


E 


zl 


we get 


eR) = ET nmn (I a) и 


і=1 П 


since Е(х;) = д. Then, by the Corollary of Theorem 4.14 on page 167, we 
conclude that 


HENCE: 1 5 д 
vars) = 0 a == M 


It is customary to write E(X) as us, var(X) as сг, and refer to оу as the 


standard error of the mean. The formula for the standard error of the mean, 
g 


Qa а Sm 
when n, the sample size, is increased. This means that when n becomes larger 
and we actually have more information (the values of more random variables), 
we can expect values of x to be closer to и, the quantity they are intended to 
estimate. If we refer to Chebyshev's theorem as it is formulated in Exercise 16 
on page 158, we can express this formally in the following way: 


shows that the standard deviation of the distribution of x decreases 


THEOREM 82 For any positive constant c, the probability that x will take 
2 
. o 
on a value between и — c and и + c is at least 1 — —5; when n > œ, this 
пс? 


probability approaches 1. 


Ие 


This result, called a law of large numbers, is primarily of theoretical interest. 
Of much more practical value is the central limit theorem, one of the most 
important theorerhs of statistics, which concerns the limiting distribution of the 
standardized mean of n random variables when n — co. We shall prove this 


evum жы арлар ьт 
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theorem here only for the case where the n random variables are a random sample 
from a population whose moment-generating function exists. More general condi- 
tions under which the theorem holds are given in Exercises 8 and 9 on pages 
281 and 282, and the most general conditions under which it holds are referred 
to at the end of this chapter. 


THEOREM 83 (Central limit theorem) If xı, X»,..., and x, constitute a 
random sample from an infinite population having the mean џ, the variance 
a°, and the moment generating function M,(t), then the limiting distribution 
of 


кад 


Е 


as п > оо, is the standard normal distribution. 


Proof. First using part 3 of Theorem 4.10 and then part 2, we get 


мд) = Ms, (t) = etn - m,( =) 


о/уп 
-Упш/а ( 1 
е "Mal —=)- 
суп 


Since nx = X, + X; + +++ + X,, it follows from Theorem 7.2 that 


= p nue, JE y 5 
E UNE 


pce 


t ARE E 
Expanding M,| —— | as a power series in t, we obtain 
paucis RUE р , 


and, hence, that 


In M,(t) 


vn м t Ё 6 | 
=, t — D 
In M,(t) +n-iInfl+ pi PT ay + us De ED 


276 Chap. 8: Sampling Distributions 


where ui, шу, ш5,...,аге the moments of the population distribution, 
namely, those of the distribution of the original random variables x;. 

If n is sufficiently large, we can use the expansion of In(1 * x) asa 
power series in x (as on page 228), getting 


Упш ЕТЕ АМ ИШҮ? 
In M,(t) = — G +n pus eonna. C 


1 t r Га f 
2-Imp——tuübl4aA—-—T 
ы суп "2g бопуп | 
1 t Ü р в 
+- !— + теу + I ——— + Eis үч 
| суп ire 2a^n Vis бопуп | | 
Then, collecting powers of t, we obtain 


D t 2 
Ути | yup, d (5 н wie 
с 


с 20? 20? 


In M,(t) = ( 


(i i) E 
боуп 2о?/п 30n 


and since и! = u and ш — и? = c, this reduces to 


шм) = у + (#1 - Bis q E) ү Te 
>» 2 6 2 3 / an 


y d t J 1 
Finally, observing that the coefficient of г? is a constant times Jm and 
n 


in general for r > 2 the coefficient of t” is a constant times E we get 
Уп"? 


and, hence, 


lim M,(t) = e 


n>% 


since the limit of a logarithm equals the logarithm of the limit (provided 
these limits exist). Identifying the limiting moment-generating function at 
which we have arrived as that of the standard normal distribution, we 
need only the two theorems stated on page 229 to complete the proof of 
Theorem 8.3. v 
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Sometimes, the central limit theorem is interpreted incorrectly as implying 
that the distribution of x approaches a normal distribution when п > со, This is 
incorrect because var(x) > 0 when n > ©; on the other hand, the central limit 


theorem does justify the approximation of the distribution of x with a normal 
2 


distribution having the mean и and the variance — when n is large. In practice, 
n 


this approximation is used when n > 30 regardless of the shape of the population 
sampled. For smaller values of n the approximation is questionable, but see 
Theorem 8.4 below. 


EXAMPLE 8.1 


A soft-drink vending machine is set so that the amount of drink dispensed is a 
random variable with a mean of 200 milliliters and a-standard deviation of 15 
milliliters. What is the probability that the average (mean) amount dispensed in 
a random sample of size 36 is at least 204 milliliters? 


Solution 
According to Theorem 8.1, the distribution of x has the mean ug = 200 


AA 15 
and the standard deviation оу = 36 = 2.5, and according to the central 
limit theorem, this distribution is:approximately normal. Since z 


pI cd 
mm = 1.6, it follows from Table III that P(x > 204) = P(z > 1.6) 


0.5000 — 0.4452 — 0.0548. A 


It is of interest to note that when the population we are sampling is normal, 
the distribution of Ж is a normal distribution regardless of the size of n. 


THEOREM 84 If xis the mean of a random sample of size n from a normal 
population with the mean и and the variance c^, its sampling distribution 
is a normal distribution with the mean и and the variance с?/п. 


Proof. According to Theorems 4.10 and 7.3 we can write 


mio [s 
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and since the moment-generating function of a normal distribution with 
the mean и and the variance c? is given by 


MG) = etin 


according to Theorem 6.6, we get 


ll 


prid T" 
M 


This moment-generating function is readily seen to be that of a normal 
distribution with the mean и and the variance а?/ n, and to complete the 


proof of Theorem 8.4 we have only to refer to the two theorems on page 
229. v 


М,(1) 


8.3 THE DISTRIBUTION OF THE MEAN: 
FINITE POPULATIONS 


If an experiment consists of selecting one or more values from a finite set of 
numbers (c, c>,..., cy}, this set is referred to as a finite population of size N. 
In the definition which follows; it will be assumed that we are sampling without 
replacement from a finite population of size N. 


— 


DEFINITION 8.3 If x, is the first value drawn, x, is the second value 
drawn, ... , x, is the nth value drawn, and the joint probability distribution 
of these random variables is given by 


1 


RI B BRE Ж (бушу 1) 


for each ordered n-tuple of values selected, then Xi,X5,..., and x, are said 
to constitute a random sample from the finite population. 


It follows that the probability for each Subset of n of the N elements of the finite 
population (regardless of the order in which the values are drawn) is 
n! 1 


NINI)... (N -n41 


and this is often given as the requirement for the selection of a random sample 
of size n from a finite population of size N. 
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Italso follows from the above joint distribution that the marginal distribution 
of x, is 


1 
f(x, CN for x, = €, €1,..., CN 


for r = 1, 2,...,n, and we refer to its mean and its variance as the mean and 
the variance of the finite population. 


DEFINITION 84 The mean and the variance of the finite population 
{сг Civ см) are 


a 1 


Lx el am ЕЕЕ 
се N and c = X (а и) N 


The joint marginal distribution of any two of the random variables x;, 
%,..., and x, is given by à 


1 d 
glm X) = NN-» 


for each ordered pair of values of the finite population, and it follows that their 
covariance is Р 


N N 1 
cov(x,, х.) — X X ——— (a — wg – ш) 


iij NON — 1) 
ij 
N 2 N 
Р a] zx с? 
Ша i=1 Dia 
N(N - 1) 
ERS о? 
FaN si 


Making use of all these results, let us now prove the following theorem: 


THEOREM 8.5 If x is the mean of a random sample of size n from a finite 
population of size N with the mean и and the variance o°, then 


с? 


E(x) = ш and var(x) = 3 
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1 
Proof. Substituting a, = —, var(x;) = o^, and cov(x;, xj) = 


М? 
а? 
ONSE into Theorem 4.14, we get 
3 E 
EX)-Y-:u-pg 
iin 
and 


n 2 
үа) = Y 02. zz(-—) 


ї=1 Л i<j n 

а? n(n — 1) a Z) 
=> +27 pe [1c 

n 2 mA N-1 
у су 

DA NC 


It is of interest to note that the formulas we obtuined for var(x) in Theorems 


жай 


i Indeed, 


when N is large compared to n, the difference between the two formulas for 


N 
8.1 and 8.5 differ only by the finite population correction factor N 


: a 8; ie 

var(x) is generally negligible, and the formula о; = Wes is often used as an 
n 

approximation when we are sampling from a large finite population. A general 

rule of thumb is to use this approximation so long as the sample does not 

constitute more than 5 percent of the population. ў 


THEORETICAL EXERCISES І 


1. Verify the following computing formula for the value of a sample variance: d 


2. Use the Corollary of Theorem 4.15 to show that if xi, X2, ... , and x, constitute 
a random sample from an infinite population, then 


COV(X; — X; x) —0 


Гог 71:25 rn 
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. Use Theorem 4.14 and its Corollary to show that if x11, X12,- - -> Xin, X213 
X22, - - - , Хәл, are independent random variables, with the first n, constituting 
a random sample from an infinite population with the mean д, and the 
variance с?, and the other n; constituting a random sample from an infinite 
population with the mean u, and the variance aż, then 


E(X, — %) = Mi Hua 


and 
S АКЫЛ 
var(X, — X?) = — + — 
n п; 


. Show that if the two samples of the preceding exercise come from normal 


populations, then X, — X; has a normal distribution with the mean шу — pz 
2 2 


and the variance F + Е (Hint: Proceed as in the proof of Theorem 8.4.) 
1 2 

. If x, X;,...,and x, are independent random variables having identical 
Bernoulli distributions with the parameter 6, then X is the proportion of 
successes in m trials, which we denote by Ө. Verify that E(0) = 0 and 

^ 6(1— 0 
var(0) = SL n 
n 

. If the first n, random variables of Exercise 3 have Bernoulli distributions 
with the parameter 0, and the other n; random variables have Bernoulli 
distributions with the parameter 65, show that 


E(6, - 6,) = 6 — 6 
and 


AA 6,(1 — 0 6;(1 — 6; 
Tia 4070) ee 0) 
n n; 
. Looking at the binomial distribution as on page 268, use the central limit 
theorem to prove Theorem 6.8. 


. The following is a sufficient condition for the central limit theorem: If the 
random variables x, , X2, ..., and x, are independent and uniformly bounded 
(that is, there is a positive constant k such that the probability that any one of 
the x, takes on a value greater than k or less than —k is 0), then if the variance 
of Yn = Xy + X2 + +++ + х, becomes infinite when n > ©, the distribution of 
the standardized mean of the x, approaches the standard normal distribution. 
Show that this sufficient condition holds for a sequence of independent 
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10. 


11. 


12. 
13. 


random variables x; having the probability distributions 


for x; = 1 — (1)! 
for x; = (4)'-1 


fx) = р 


The following is a sufficient condition, the Laplace-Liapounoff condition, 
for the central limit theorem: If x,, X2, X3, ..., is a sequence of independent 
random variables, each having an absolute third moment 


& 7 Е(|х, – up) 


and if 


lim [зау]? Ў с = 0 


i 
wherey, = Xy - Xx; + X,, then the distribution of the standardized mean 
of the x, approaches the standard normal distribution when n > co. Use this 


condition to show that the central limit theorem holds for the sequence of 
random variables of Exercise 8. 


If x, x, X5,...,is a sequence of independent random variables having the 
uniform densities 


1 
NEL forüc x «2-1 

i 
f(x) = DUE 


0 elsewhere 


show that the central limit theorem holds using 
(a) the conditions of Exercise 8; 
(b) the conditions of Exercise 9. 


Explain why the results of Theorem 8.1 apply instead of those of Theorem 
8.5 when we sample with replacement from a finite population. 
Explain the results of Exercise 11 on page 199 in the light of Theorem 8.5. 


If a random sample of size n is selected from the finite population which 
consists of the first N Positive integers, show that 


(a) the mean of the distribution of x is Де 


(b) the variance of the distribution of x is QUE т, 
n 
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(c) the mean and the variance of the distribution of y = n: x аге · 


E(y) = EN 1) апа уаг(у) = RUNE DONC n UY ihn 


(Hint: Refer to Appendix II or the results of Exercise 1 on page 183.) 


APPLIED EXERCISES 


14. 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


How many different samples of size n — 3 can be drawn from a finite 
population of size 

(а) М=12; (b) N = 20; (c) N = 50? 

If a random sample of size n = 4 is drawn from a finite population of size 
М = 200, what is the probability of each possible sample? 


If a random sample of size n = 3 is drawn from a finite population of size 
N = 50, what is the probability that a particular element of the population 
will be included in the sample? 

For random sampling from an infinite population, what happens to the 
standard error of the mean if the sample size is 

(a) increased from 30 to 120; 

(b) increased from 80 to 180; 

(c) decreased from 450 to 50; 

(d) decreased from 250 to 40? 

Find the value of the finite population correction factor for 

(a п = 5 апі N = 200; 

(b) n = 50 and N = 300; 

(c) п = 200 and № = 800. 

A random sample of size п = 100 is taken from an infinite population with 
the mean ш = 75 and the variance т? = 256. If we use Chebyshev's theorem, 
with what probability can we assert that the value we obtain for X will fall 
between 67 and 83? 

Use the central limit theorem instead of Chebyshev's theorem to rework the 
preceding exercise. 

A random sample of size n — 81 is taken from an infinite population with 
the mean u = 128 and the standard deviation o = 6.3. With what probability 
can we assert that the value we obtain for x will not fall between 126.6 and 
129.4, if we use 

(a) Chebyshev's theorem; 

(b) the central limit theorem. 


. Rework part (b) of the preceding exercise, assuming that the population is 


not infinite, but finite and of size N — 400. 
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23. 


24. 


25. 


26. 


27. 


A random sample of size 64 is taken from a normal population with и = 51.4 
and с = 6.8. What is the probability that the mean of the sample will 


(a) exceed 52.9; 


(b) fall between 50.5 and 52.3; 
(c) be less than 50.6? 


A random sample of size 100 is taken from a normal population with с = 25. 
What is the probability that the mean of the sample will differ from the mean 
of the population by 3 or more either way? 


Independent random samples of size 400 are taken from each of two popula- 
tions having equal means and the standard deviations су = 20 and о; = 30. 
Using Chebyshev's theorem and the results of Exercise 3, what can we assert 
with a probability of at least 0.99 about the value we will get for x, — x;? 
(By "independent" we mean that the samples satisfy the conditions of 
Exercise 3.) 


Assuming that the two populations of thé preceding exercise are normal, use 
the result of Exercise 4 to find k such that P(-k < x, — X; < К) = 0.99. 


Independent random samples of size n, = 30 and n, = 50 are taken from 
two normal populations having the means Hı = 78 and m, = 75, and the 
variances oj = 150 and о? = 200. Use the results of Exercise 4 to find the 
probability that the mean of the first sample will exceed that of the second 
sample by at least 4.8. 


- The actual proportion of families in a certain city who own, rather than rent, 


their home is 0.70. If 84 families in this city are interviewed at random and 
their responses to the question whether or not they own their home are looked 
upon as values of independent random variables having identical Bernoulli 
distributions with the parameter 0 = 0.70, with what probability can we 
assert that the value we obtain for the sample proportion, 6, will fall between 
0.64 and 0.76, using the result of Exercise 5 and 

(a) Chebyshev's theorem; 

(b) the central limit theorem? 


The actual proportion of men who favor à certain tax proposal is 0.40 and 
the corresponding proportion for women is 0.25; n, = 500 men and n; = 400 
Women are interviewed at random and their individual responses are looked 
upon as the values of independent random variables having Bernoulli distri- 
butions with the respective parameters 6; = 0.40 and 6, = 0.25. What can 
we assert, according to Chebyshev’s theorem, with a probability of at least 
0.9975 about the value we will get for 6, = 6,, the difference between 


the two sample Proportions of favorable responses? Use the result of 
Exercise 6. 
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84 THE CHI-SQUARE DISTRIBUTION 


In Example 7.9 we showed that jf x has the standard normal distribution, then 

x? has the special gamma distribution which we referred to as the chi-square 

distribution, and this accounts for the important role which the chi-square 

distribution plays in problems of sampling from normal populations. To review 

some of the results of Section 6.3, a random variable x has the chi-square 

distribution (also written x? distribution) with v degrees of freedom, if its density 
is given by 

v-2 
1 x eu 
f(x) = 4 2"?r(v/2) 
0 elsewhere 


forx > 0 


The mean and the variance of the chi-square distribution with v degrees of 
freedom are v and 2», and its moment-generating function is given by 


M,(t) = (1 = 2077 


The chi-square distribution has several important mathematical properties, 
which are given in Theorems 8.6 through 8.9. First, let us formally state the result 
of Example 7.9, which we referred to above. 


THEOREM 8.6 If x has the standard normal distribution, then X? has the 
chi-square distribution with v — 1 degree of freedom. 


More generally, let us show that 


THEOREM 87 If x,, X;,...,and x, are independent random variables 
having standard normal distributions, then 


has the chi-square distribution with > = n degrees of freedom. 
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Proof. By Theorem 8.6, 
M(t) = (1 — 210) 
so that, by Theorem 7.3, it follows that 


n 


M,(t) = I (1 - 20) = (1-207 


This moment-generating function can be identified by inspection as that of 
the chi-square distribution with v = n degrees of freedom. v 


Two further properties of the chi-square distribution are given in the 
following two theorems, which the reader will be asked to prove in Exercises 1 
and 2 on page 295: 


THEOREM 8.8 If x,, X;,.,.,and x, are independent random variables 
having chi-square distributions with v, , v2,..., and v, degrees of freedom, 
then 


у= ух, 


і=1 


has the chi-square distribution with y+ + o», degrees of 


ee ae on 


; 


THEOREM 89 If x, and x, are independent random variables, x, has a 
chi-square distribution with v, degrees of freedom, and x, + x, has 
a chi-square distribution with v > v, degrees of freedom, then x, has a 
chi-square distribution with v — v, degrees of freedom. 


The chi-square distribution has many important applications, some of which 
will be discussed in Chapters 10 through 13. Foremost, there are those based, 
directly or indirectly, on the following theorem: 
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| 


THEOREM 8.10 If x and $° are the mean and the variance of a random 

sample of size n from a normal population with the mean ш and the variance 
2 

с“, then 


1. хапа sare independent; 


(n - 1)s? 


2. the random variable —————— has a chi-square distribution with 
g 


_ n — 1 degrees of freedom. 


Proof. Since a detailed proof of part 1 would go beyond the scope 
of this text, we shall assume the independence of x and s? in our proof of 
part 2. (In addition to the references to proofs of part 1 on page 305, 
Exercise 8 on page 296 outlines the major steps of a somewhat simpler 
proof based on the idea of a conditional moment-generating function, and 
in Exercise 7 on page 296 the reader will be asked to prove this independence 
for the special case where n — 2.) 

To prove part 2, let us begin with the identity 


Ў и-и? = 0-23 + тя)? 


і=1 + 


which the reader will be asked to verify in Exercise 3 on page 295. Now, 
if we divide each term by o? and substitute (n — 1)s* for Y (x, — x)’, 
i=l 


Ble) a ile: 


So far as the three terms of this identity are concerned, we know from 


2 
2 is a random variable having a chi-square 


it follows that 


п (x, - 
Theorem 8.7 that У ( = 
і=1 


distribution with n degrees of freedom. Also, according to Theorems 8.4 


and 8.6, X | hasa chi-square distribution with 1 degree of freedom. 


( с/Уп ) 

(п - Ds 
Now, since x and з? are independent, it follows that andes 
ane PA с 


=4 2 
are independent and, hence, we conclude by Theorem 8.9 that ie г.и 
g 


has a chi-square distribution with n — 1 degrees of freedom. v 
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Since the chi-square distribution arises in many important applications, 
integrals of its density have been extensively tabulated. Table V at the end of 
this book contains values of y2,, for а = 0.995, 0.99, 0.975, 0.95, 0.05, 0.025, 
0.01, 0.005, and v = 1,2,...,30, where Xa» is such that the area to its right 
under the chi-square curve with v degrees of freedom (see Figure 8.1) is equal 
to a. That is, x2, is such that 


P(x? > Xa») = а 
When v exceeds 30 so that Table V cannot be used, probabilities related to 


chi-square distributions are usually approximated with the use of normal distribu- 
tions (see Exercises 5 and 6 on page 295). 


0 2 


Figure 8.1 Chi-square distribution. 


EXAMPLE 8.2 


Suppose that the thickness of a part used in a semiconductor is its critical 
dimension, and that the process of manufacturing these parts is considered to 
be under control if the true variation among the thicknesses of the parts is given 
by a standard deviation not greater than с = 0.60 thousandth of an inch. To 
keep a check on the process, random samples of size п = 20aretaken periodically, 
and it is regarded to be “out of control” if the probability that s^ will take on a 
value greater than or equal to the observed sample value is 0.01 or less (even 
though o = 0.60). What can one conclude about the process if the standard 
deviation of such a periodic random sample is s — 0.84 thousandth of an inch? 


Solution 


A 2 
The process will be declared “out of control” if ШП: with п = 20 and 
с 
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с = 0.60 exceeds 2,1 = 36.191. Since 


(п —1)5° _ 19(0.84)? 
— 6—5 = 37.24 

га? (0.60)? 
this is the case іп our example. Of course, it is assumed in this analysis that 
the sample may be regarded as a random sample from a normal popu- 
lation. A 


85 THE t DISTRIBUTION 


In Theorem 8.4 we showed that for random samples from a normal population 
with the mean и and the variance a°, x has a normal distribution with the mean 
2 Ра = 

E c rs aae j 

ш and the variance —; that is, 5 has the standard normal distribution. The 
n c/ vn 


major difficulty in applying this result is that in actual practice ø is usually 
unknown, which makes it necessary to replace it with a value of the sample 
standard deviation s (or some other estimate). The theory which follows leads 


ur MEE Xx | 
to the exact distribution of 6) A for random samples from normal populations. 
n 


To derive this sampling distribution, let us first study the more general 
problem stated in the following theorem: 


THEOREM 8.11 If y and z are independent random variables, y has a 
chi-square distribution with v degrees of freedom, and z has the standard 
normal distribution, then the distribution of 


t- = 
Уу/> 
is given by 
vtl к; 
(1) a 
Wir (1+5) for -œ < t < oo 


eB 


and it is called the t distribution with > degrees of freedom. 
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Proof. Since y and z are independent, their joint density is given by 


1 4 


G)” 


for y > 0 and —оо < z < co, and f(y, 2) = 0 elsewhere. Then, to use the 


—1:? 
2523 


1 
2) = En 


ч 2 A 
change of variable technique of Section 7.3, we solve t = Ет for 2 getting 


9. NS : 
z = tV y[ v and, hence, 7 = V y/ v. Thus, by Theorem 7.1, the joint density 


of y and t is given by 


x-1 E eee 
T sd dl for y > 0 and -œ < t < c 


elsewhere 


2 
t 
and, integrating out y with the aid of the substitution w = (1 + г) ‚ we 


finally get 


for -© < t < o M 


vtl 
(eS) a 
деч (1) 
vsn(z) 
2 
The t distribution was first obtained by W. S. Gosset, who published his 
research under the pen name “Student”; hence, the distribution is also known 
as the Student-t distribution, or Student's-t distribution. 
In view of its importance, the t distribution has been tabulated extensively. 
Table IV, for example, contains values of t, for a = 0.10, 0.05, 0.025, 0.01, 
0.005, and v = 1,2,...,29, where f, , is such that the area to its right under the 


curve of the : distribution with v degrees of freedom (see Figure 8.2) is equal 
to a. That is, ft, is such that 


Р =). = т 


The table does not contain values of t,, for a > 0.50, since the density is 
symmetrical about t = 0 and, hence, ti-a» = —1,,. When v is 30 or more, 
probabilities related to the t distribution are usually approximated with the use 
of normal distributions (see Exercise 12 on page 297). 
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Figure 8.2 t distribution. 


Among the many applications of the t distribution, some of which will be 
treated in Chapters 11 and 13, its major application (for which it was originally 
developed) is based on the following theorem: 


THEOREM 812 If Ж and s? are the mean and the variance of a random 
sample of size n from a normal population with the mean и and the variance 


c^, then 


Xn 
anm 


has the 1 distribution with n — 1 degrees of freedom. 


Proof. By Theorems 8.10 and 8.4, the random variables 


Or sis: хи 
== and pe 


-square distribution with n — 1 degrees of freedom 


have, respectively, a chi 
e, indepen- 


and the standard normal distribution. Since they are, furthermor 
dent by part 1 of Theorem 8.10, substitution into the formula for t of 


Theorem 8.11 yields 


с/Уп 23 


\/з?/ о? s/n 


t- 


and this completes the proof. Y 
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EXAMPLE 8.3 


86 THE F 


In 16 one-hour test runs, the gasoline consumption of an engine averaged 16.4 
gallons with a standard deviation of 2.1 gallons. Test the claim that the average 
gasoline consumption of this engine is 12.0 gallons per hour. 


Solution 


Substituting n = 16, џ = 12.0, X = 16.4, and s = 2.1 into the formula for 
t in Theorem 8.12, we get 


UNIS P 20 
СЫ К. = 838 


ДРУ 6 


Since Table IV shows that the probability of getting a value of t greater 
than 2.947 is 0.005 for 15 degrees of freedom, the probability of getting a 
value greater than 8 must be negligible. Thus, it would seem reasonable to 
conclude that the true average hourly gasoline consumption of the engine 
exceeds 12.0 gallons. A 


DISTRIBUTION 


Another distribution which plays an important role in connection with sampling 
from normal populations is the F distribution. We shall define this distribution 
as the sampling distribution of the ratio of two independent chi-square random 
variables, each divided by its respective degrees of freedom. 


THEOREM 8.13 If u and v are independent random variables having chi- 
square distributions with v, and v; degrees of freedom, then the distribution 
of 


is given by 


n + v, 
Г irae 
( 2 ) PA! ; y, det) 
dpa cer a) NU e for F> 0 
dikes icy ed : 
2 2 
0 elsewhere 


and it is called the F distribution with v, and v, degrees of freedom. 
b — 
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Proof. The joint density of u and v is given by 


flu, v) = —+ > uz e 2 : $e? 
aver(2) Zi 
2 UNE 
1 nca Жас KES 
Me ps не" 22 


gota (2) (2) 
r 2 Е 2 


for и > 0 and v > 0, and f(u, v) = 0 elsewhere. Then, to use the change 
of variable technique of Section 7.3, we solve Е = а for и getting 
v2 


v ð 

u = =: vF and, hence ео Thus, by Theorem 7.1, the joint density 
Vj OF v 

of F and v is given by 


c) 
^7 Más 7 ntn, v( v, F 
Ml, AT... ~-( +1 
CC 
2 2 


for F > 0 and v > 0, and g(F, v) = 0 elsewhere. Now, integrating out v 
F 

by making the substitution w — : s + ) , ме finally get 
2 


0 elsewhere v 


In view of its importance, the F distribution has been tabulated extensively. 
Table VI, for example, contains values of Е, for a = 0.05 and 0.01, and for 
various values of v, and vz, where К, is such that the area to its right under 
the curve of the F distribution with v, and v; degrees of freedom (see Figure 
8.3) is equal to a. That is, ED iris such that 


Р(Е > Fann) = & 


Important applications of Theorem 8.13 arise in problems in which we are 
interested in comparing the variances с? and с> of two normal populations; for 
2 
Ti 


instance, in problems in which we want to estimate the ratio zn or perhaps test 
2 
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Em. 


0 Fa, 01,02 


Figure 8.3 F distribution. 


whether с? = с>. We base such inferences оп independent random samples of 
size n, and n; from the two populations and Theorem 8.10, according to which 


2 (т l)si > _ (т - Usi 
кы Meri iy ees 
ХІ p and x»; oi 


are random variables haying chi-square distributions with n, — 1 and n; — 1 
degrees of freedom. By "independent random samples” we mean that the n, + n; 
random variables constituting the two random samples are all independent, so 
that xj and x2 are also independent and their substitution into Theorem 8.13 
gives the following result: 


THEOREM 8.14 If s? and 52 are the variances of independent random 
samples of size n, and n; from normal populations with the variances o? 
and o3, then 


2,2 252 
_ 83/04 9281 
EI 

52/02 0182 


has ап F distribution with n; — 1 and n; — 1 degrees of freedom. 
]-—— 


In i gu 11 we shall apply Theorem 8.14 to the problem of estimating 

the ratio = when these two population variances are unknown; also, in Chapter 
2 

13 we shall demonstrate how to test whether o? = o3. Still other tests based on 

the F distribution are presented in the analysis-of-variance procedures of Chapter 


Sec. 86.: The F Distribution 


295 


15. Since all these applications are based on the ratios of sample variances, the 


F distribution is also known as the variance-ratio distribution. 


THEORETICAL EXERCISES 
1. Prove Theorem 8.8. 

2. Prove Theorem 8.9. 

3. Verify the identity 


iMs 


(x — wy = È Cu -3 + mR и)" 


which we used in the proof of Theorem 8.10. 


4. Use Theorem 8.10 to show that for random samples of size n from a normal 
population with the variance c^, the sampling distribution of s? has the mean 


6 20 
а? and the variance rÜ 
mE 


(A general formula for the variance of the sampling 


distribution of s? for random samples from any population having finite 
second and fourth moments may be found in the book by H. Cramér listed 


on page 305.) 


5. If the range of x is the set of all positive real numbers, show that for k > 0 
(a) the probability that the random variable V2x — V2n takes on a value 


x- 


less than k equals the probability that the random variable Um takes 


2 
on a value less than k + T 


(b) if x has a chi-square distribution with n degrees of freedom, then for 
large n the distribution of Vix — Jn сап be approximated with the 


standard normal distribution. 


Also use the result of part (b) and Theorem 8.10 to show that for large n the 
2 


g 
variance of the sampling distribution of s is approximately a 


6. Find approximate values for the probability that a random variable x having 
a chi-square distribution with 50 degrees of freedom will take on a value 


greater than 68.0 
x- 


(a) by treating F with v = 50 as a random variable having the standard 


normal distribution; 


(b) by treating У2х — Jy with v = 50 as a random variable having the 


standard normal distribution. 
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Also, judge the merits of these approximations, given that the actual value 
of the probability (rounded to five decimals) is 0.04596. 

7. If x, and x; are independent random variables having the standard normal 
distribution, show that 
(a) the joint density of x, and x is given by 


1 psy; HUI —7\2 
f(x, 5) > e er emm 
т 


for =œ < x, < © and -œ < X < co; 
(b) the joint density of u = |x, – x| and x is given by 


2 242 
z (х?+и?) 
, =—-@e 
g(u, X) 


for и > 0 and —co < x < co, since f(x; X) is symmetrical about x for 
fixed x; 
(с) з? = 2(x, — 3}? = 202; 
(d) the joint density of x and s? is given by 
W(s?, 2) = Re? Lay 
1 Ут “т 
for s? > 0 and ~œ < x < oo, so that s? and X are independent. 


8. If the random variables x, , X», ..., and x, constitute a random sample from 

à normal population with the mean д and the variance а?, 

(a) find the conditional density of x, given x, — ЮХА. = X 
and then set x, = nx — x, — --. — X, and use the change of variable 
technique to find the conditional density of x given x, = х, ху = 
Хз,...,Х„ = Xn; 

(b) find the joint density of X, x, х;,... ‚ X, by multiplying the conditional 
density of x in part (a) by the joint density of X2, X3, ..., X, and show 
that 


1 т-1 (n= 1)s? 
колою = Val ) e 20 


ovm 


for -© < x, «0,1 = 2,3,..., m; 


2 
(c) show that the conditional moment-generating function of шш 


in- 1)s* n-1 


Pies il 29 7* ^ riz 


10. 


11. 


12. 


13. 


14. 
15. 


16. 


17. 


18. 
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Since this result is free of x, it follows that x and s^ are independent; 


(n = 1)s? 
2 


it also shows that has a chi-square distribution with n — 1 


degrees of freedom. 
This proof is due to J. Shuster, and it is referred to on page 305. 


. If x and y are independent, x has a normal distribution with и = 5 and 


а? = 15 and y has a chi-square distribution with 5 degrees of freedom, find 
P(x — 5 > Зуу). 


Use the method based on Theorem 7.2 to rework the proof of Theorem 8.11. 


(ни: Let t = and u = ») 


2 
Уу] v 


Show that for v > 2 the variance of the t distribution with > degrees of 


Ab 
freedom is A 
y —2 


Use Stirling's formula of Exercise 3 on page 18 to show that when v > oo 
the t distribution approaches the standard normal distribution. 


Use the method based on Theorem 7.2 to rework the proof of Theorem 8.13. 
(ни: Let F = wih and w = x) 
v/ v, 


Show that the t distribution with 1 degree of freedom is a Cauchy distribution. 
Show that the F distribution with 4 and 4 degrees of freedom is given by 


6Е(1 + F)* for Е> 0 
0 elsewhere 


gUB) = { 


and use this density to find the probability that for independent random 
samples of size 5 from two normal populations having the same variance, 
s?/s3 will take on a value less than } or greater than 2. 

1 
If x has an F distribution with v, and v; degrees of freedom, show that y — т 
has ап Е distribution with v, and v, degrees of freedom. 


1 


Use the result of Exercise 16 to show that Fya,,.». = F j 
о»), 


V. 


Show that for v, > 2 the mean of the F distribution is 2 7 and investigate 


LL pd 
what happens when v, equals 1 and 2. 
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19. 


20. 


21. 


V. 
If x has a beta distribution with a — x and B = r2 show that 


vX 
ERE 
»(1- x) 


has an F distribution with v, and v; degrees of freedom. 

If x has an F distribution with v, and v; degrees of freedom, show that when 
v; > 00 the distribution of уух approaches the chi-square distribution with 
v, degrees of freedom. 


Show that if x has the г distribution with v degrees of freedom, then x^ has 
the F distribution with 1 and v degrees of freedom. 


APPLIED EXERCISES 


22. 


23. 


. 24. 


25. 


27. 


28. 


Integrate the appropriate chi-square density to find the probability that the 
variance of a random sample of size 5 from a normal population with o^ — 25 
will fall between 20 and 30. 


The claim that the variance of a normal population is с? = 25 is to be rejected 
if the variance of a random sample of size 16 exceeds 54.668 or is less than 
12.102. What is the probability that this claim will be rejected even though 
07 1259 

The claim that the variance of a normal population is o^ = 4 is to be rejected 


if the variance of a random sample of size 9 exceeds 7.7535. What is the 
probability that this claim will be rejected even though o? — 4? 


A random sample of size 25 from a normal population has the mean X — 47 
and the standard deviation s — 7. Basing our decision on the statistic of. 
Theorem 8.12, can we say that the given information supports the conjecture 
that the mean of the population is ш = 42? 


‚ A random saniple of size 12 from a normal population has the mean x — 27.8 


and the variance s? = 3.24. Basing our decision on the statistic of Theorem 
8.12, can we say that the given information supports the claim that the mean 
of the population is и = 28.5? 


If s; and s? are the variances of independent random samples of size n, — 61 
and п, = 31 from normal populations with с? = 12 and oj = 18, find 
P(si/si > 1.16). 


if s; and 82 are the variances of independent random samples of size n, — 10 


and п, = 15 from normal populations with equal variances, find P(s?/s3 < 
4.03). 
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87 ORDER STATISTICS 


Given a random sample of size п from an infinite population with a continuous 
density, suppose that we arrange the values taken on by xi, х›,..., X, according 
to size and look upon the smallest of the x's as a value of the random variable 
yı, the next largest as a value of the random variable уз, the next largest after 
that as a value of the random variable y;,..., and the largest as a value of the 
random variable y,. The random variables thus defined are referred to as order 
statistics; in particular, y; is the first order statistic, y; is the second order statistic, 
уз is the third order statistic, and so on. (We limited this discussion to infinite 
populations with continuous densities so that the probability is zero that any two 
of the x's will be alike.) 

To be more specific, consider the case where n — 2 and the relationship 
between the values of x, and x; and y, and y; is 


yi = x, and y; = x; when x, < x; 


yi = х and y; = x, when x; < x, 


Similarly, for n = 3 the relationship between the values of the respective random 
variables is 
Vi = Xi, Y2 = X2, and уз = хз, when xy < X; < X3 


yi = х,у = хз, and y, = X2, when x, < x < x; 


yi = Xi, Y2 = X, and уз = ху, when x; < X) < xi 


Let us now derive the probability density of the rth order statistic. 


THEOREM 8.15 For random samples of.size n from an infinite population 
which has the value f(x) at x, the probability density of the rth order statistic 


y. is given by 


X ESI со nor 
go) = PUR NS a] nof fo) ax] 


for 00 < у, < ©. 


Proof. Suppose that the real axis is divided into three intervals, one 
from —© to y,, a second from y, to y, + h (where h is a positive constant), 
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and the third from y, + h to co. Then, if the density of the population we 
are sampling has the value f(x) at x, the probability that r — 1 of the sample 
values fall into the first interval, one falls into the second interval, and n — r 
fall into the third interval is 


n! » бар (oth E ie 
ewe. tme] [JP nme] ma] 


according to the formula for the multinomial distribution. Using the law 
of the mean from calculus, we have 


y,th 
| f(x) dx = f(&) +h where у = £s y +h 


Yr 


and if we let h > 0, we finally get 


! >, rl eo n-r 
gv) == Лх) as] rool | fc) ax | 


for -œ < y, < оо for the probability density of the rth order statistic 
yr v 


In particular, the distribution of y, , the smallest value in a random sample 
of size n, is given by 


gu) = n: f(y) [| f(x) a for —o0 < y, < oo 


» 


while the distribution of y,, the largest value in a random sample of size n, is 
given by 


y, 


Ya п-1 
ө) тл|] fo) ax] for —o0 < y, < оо 
Also, in a random sample of size n = 2m + 1 the sample median X is Ym«1, 50 
that its sampling distribution is given by 


. Qm +1)! 
~ m'm! 


h(x) i f(x) ano [ло a|” for —o0 < X < © 


[For random samples of size n = 2m the median is defined as Жул + Ym+1).] 
In some instances it is possible to perform the integrations required to 

obtain the densities of the various order statistics; for other populations there 

may be no choice but to approximate these integrals by using numerical methods. 
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EXAMPLE 8.4 


For random samples of size n from an exponential population with the parameter 
6, show that the distributions of y, and y, are given by 


n 
M УЫ e m/e for y, > 0 
0 elsewhere 
and 
he —y,/8 -y,/01n-1 f 
dose е ает Д ог у, > 0 
0 elsewhere 


and that, for random samples of size п = 2m + 1 from this kind of population,- 
the sampling distribution of the median is given by 


2m * 1)! 6 = ү 
һ(х) = ит КОК STATS dor 
0 elsewhere 


Solution 


The integrations required to obtain these results are straightforward, and 
they will be left to the reader in Exercise 1 below. A 


The following is an interesting result about the sampling distribution of the 
median which holds when the population density is continuous and non-zero at 


n 
the population median Д2, which is such that | (уау 2. 


THEOREM 8.16 For large n, the sampling distribution of the median for 
random samples of size 2n + 1 is approximately normal with the mean 4 


and the variance 


1 
f(a) Tn 
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A proof of this theorem is referred to on page 305. Note that for random samples 


of size 2n + 1 from a normal population we have и = A, so that f( ñ) = f(u) = 


: z то y 
and the variance of the median is approximately a If we compare this 


ovm 


with the variance of the mean, which for random samples of size 2n + 1 from 
2 


an infinite population is we find that for large samples from normal 


o 
2n +1? 
populations the mean is more reliable than the median; that is, the mean is subject 
to smaller chance fluctuations than the median. 


THEORETICAL EXERCISES 


1. Verify the results given in Example 8.4 for the sampling distributions of yı, 
Yn, and X for random samples from an exponential population. 


2. Find the sampling distributions of у, and y, for random samples of size n 
from a continuous uniform population with a = 0 and В = 1. Also find the 
sampling distribution of the median for random samples of size 2m + 1 from 
this kind of population. 


3. Find the mean and the variance of the sampling distribution of y, for random 
samples of size n from the uniform population of Exercise 2. 


4. Find the sampling distributions of у, and y, for random samples of size n 
from a population having the beta distribution with a = 3 and B = 2. Also 
find the sampling distribution of x for random samples of size 2m + 1 from 
this population. 


5. Find the sampling distribution of y, for random samples of size n — 2 taken 
(a) without replacement from the finite population which consists of the 
first five positive integers; 
(b) with replacement from the same population. 
(Hint: Enumerate all possibilities.) 


6. Duplicate the method used on page 300 to show that the joint density of yı 
and y, is given by 


^ n-2 
gi, Yn) = n(n - wowo»| f f(x) ax] Ѓог-00 < y, < y, < © 


» 


and g(y,, yn) = 0 elsewhere. 

(a) Use this result to find the joint density of y, and y, for random samples 
of size n from an exponential population. 

(b) Use this result to find the joint density of у, and y, for random samples 


of size n from the continuous uniform population of Exercise 2. Also 
find the covariance of y, and ys 
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7. Use the formula for the joint density of у, and y, given in Exercise 6 and 
the method of Section 7.4 to find an expression for the joint density of у, 
and the sample range given by В = y, — yi. 

8. Use the result of Exercise 7 to find the sampling distribution of R for random 
samples of size n from an exponential population. 

9. Use the result of Exercise 7 to find the sampling distribution of R for random 
samples of size n from the continuous uniform population of Exercise 2. 
Also find the mean and the variance of this sampling distribution of R. 

10. There are many problems, particularly in industrial applications, in which 
we are interested in the proportion of a population that lies between certain 
limits. Such limits are called tolerance limits. The following steps lead to the 
sampling distribution of the statistic p, which is the proportion of a population 
(having a continuous density) that lies between the smallest and the largest 
values of a random sample of size n. 

(a) Use the joint density of Exercise 6 and the method of Section 7.4 to 
show that the joint density of y, and p, whose values are given by 


Yn 
p- f f(x) dx 
X 
is 
Һу, p) = n(n = 1)/(у,)р"* 


(b) Use the result of part (a) and the method of Section 7.4 to show that 
the joint density of p and w, whose values are given by 


w= ii f(x) dx 


ew, p) = n(n - 1)р"? 


for w > 0, p > 0, ж * p < 1, and e(w, p) = 0 elsewhere. 
(c) Use the result of part (b) to show that the marginal density of p is given 
by 


(of nin - Dp"? - p) forO<p<1 
g(p) = 0 elsewhere 


This is the desired density of the proportion of the population that lies 
between the smallest and the largest values of a random sample of size 
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n, and it is of interest to note that it does not depend on the form of 
the population distribution. 


11. Use the result of Exercise 10 to show that 


n-1 d Vae 2(n — 1) 
ur A n t Di 2) 


E(p) = 


What can we conclude from this about the distribution of p when n is very 
large? 


APPLIED EXERCISES 


12. Find the probability that in a random sample of size 4 from the continuous 
uniform population of Exercise 2 the smallest value will be at least 0.20. 


13. Find the probability that in a random sample of size 3 from the beta population 
of Exercise 4 the largest value will be less than 0.90. 


14. Use the result of Exercise 9 to find the probability that the range of a random 
sample of size 5 from the given uniform population will be at least 0.75. 


15. Use the result of part (c) of Exercise 10 to find the probability that in a 
` random sample of size 10 at least 80 percent of the population will fall 
between the smallest and the largest values. 


16. Use the result of part (c) of Exercise 10 to set up an equation in n, whose 
solution would give the sample size that is required to be able to assert with 
probability 1 — a that the proportion of the population contained between 
the smallest and largest sample values is at least P. Show that for P — 0.90 
and a — 0.05 this equation can be written as 


1 
0.90)" = ——— 
( ) 2n + 18 


This kind of equation is difficult to solve, but it can be shown that an 


approximate solution for n is given by ч + x . ИЕ 


Paes, Will la 
be looked up in Table V. Use this method to find an approximate solution 
of the given equation. 


* Xaa, Where Xaa must 
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Decision Theory 


INTRODUCTION 


In Chapter 4 we introduced the concept of a mathematical expectation to study 
expected values of random variables; in particular, the moments of their distribu- 
tions. In applied situations, mathematical expectations are often used as a guide 
in choosing among alternatives, that is, in making decisions, because it is generally 
considered rational to select alternatives with the “most promising" mathematical 
expectations—the ones which maximize expected profits, minimize expected 
losses, maximize expected sales, minimize expected costs, and so on. 

Although this approach to decision making has great intuitive appeal, it is 
not without complications, for there are many problems in which it is difficult, 
if not impossible, to assign numerical values to the consequences of one’s actions 
and to the probabilities of all eventualities. 


EXAMPLE 9.1 


A manufacturer of leather goods must decide whether to expand his plant capacity 
now or wait at least another year. His advisors tell him that if he expands now 


Note: The material in this chapter provides a unified approach to statistical inference. 
However, it is not a prerequisite for the classical approach to which we devote most of 
the remainder of this text and, hence, it may be omitted without loss of continuity. 
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and economic conditions remain good, there will be a profit of $164,000 during 
the next fiscal year; if he expands now and there is a recession, there will be a 
loss of $40,000; if he waits at least another year and economic conditions remain 
good, there will be a profit of $80,000; and if he waits at least another year and 
there is a recession, there will be a small profit of $8,000. What should the 
manufacturer decide to do, if he wants to minimize the expected loss during the 
next fiscal year and he feels that the odds are 2 to 1 that there will be a recession? 


Solution 
Schematically, all these “payoffs” can be represented as in the following 
table, where the entries are the losses which correspond to the various 
possibilities, and, hence, gains are represented by negative numbers:" 


Expand Delay 
now expansion 


Economic conditions 
г —80,000 
remain good 
There is a 
recession 


Since the probabilities that economic conditions will remain good and 
that there will be a recession are, respectively, запа 3, the manufacturer's 
expected loss for the next fiscal year is 


—164,000 - 1 + 40,000 · 5 = —28,000 


if he expands his plant capacity now, and 
—80,000 - 4 + (—8,000) · 3 = —32,000 


if he waits at least another year. Since an expected profit (negative expected 
loss) of $32,000 is preferable to an expected profit (negative expected loss) 
of $28,000, it follows that the manufacturer should delay expanding the 


capacity of his plant. A 


* We are working with losses, here, rather than profits, to make this example fit the 
general scheme which we shall introduce in Sections 92 and 9.3. 
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The result at which we arrived in this example assumes that the values given 
in the table and also the odds for a recession are properly assessed. As the reader 
will be asked to show in Exercises 2 and 3 on page 316, changes in these quantities 
can easily lead to different results. 

Let us also examine briefly what we might do in a situation like that described 
in Example 9.1, when we have no idea about the probabilities associated with 
the various eventualities. 


EXAMPLE 9.2 


With reference to Example 9.1, suppose that the manufacturer has no idea about 
the odds that there will be a recession. What should he decide to do, if he is a 
confirmed pessimist? 


Solution 


Being the kind of person who always expects the worst to happen, he might 
argue that if he expands his plant capacity now he could lose $40,000, if 
he delays expansion there would be a profit of at least $8,000, and, hence, 
that he will minimize the maximum loss (or maximize the minimum profit) 
if he waits at least another year. A 


The criterion used in this example is called the minimax criterion, and it is 
only one of many different criteria that can be used in this kind of situation. One 
such criterion, based on optimism rather than pessimism, is referred to in Exercise 
7 on page 317, and another, based on the fear of "losing out on a good deal," 
is referred to in Exercise 8 on page 317. 


9.2 THE THEORY OF GAMES 


The examples of the preceding section may well have given the impression that 
the manufacturer is playing a game—a game between him and Nature (or call 
it fate or whatever “controls” whether there will be a recession). Each of the 
"players" has the choice of two moves: The manufacturer has the choice between 
actions a, and а, (to expand his plant capacity now or to delay expansion for 
at least a year) and Nature controls the choice between Ө, and Ө, (whether 
economic conditions are to remain good or whether there is to be a recession). 


Depending on the choice of their moves, there are the “payoffs” shown in the 
following table: 
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Player A (The Manufacturer) 


а а 
Player В а | Lay 0) L(a;,81) 
( Nature) 

0; L(a,, Ө›) L(a;, 6) 


The amounts L(a,, Ө), L(a;, 0), ..., are referred to as the values of the loss 
function which characterizes the particular "game"; in other words, L(a;, 6j) is 
the loss of Player A (the amount he has to pay Player B) when he chooses 
alternative a, and Player B chooses alternative 0;. Although it does not really 
matter, we shall assume here that these amounts-are in dollars. In actual practice, 
they can also be expressed in terms of any goods or services, in units of utility 
(desirability or satisfaction), and even in terms of life or death (as in Russian 
roulette or in the conduct of a war). 

The analogy we have drawn here is not really far-fetched; the problem of 
Example 9.1 is typical of the kind of situation treated in the theory of games, a 
relatively new branch of mathematics which has stimulated considerable interest 
in recent years. This theory is not limited to parlor games, as its name might 
suggest, but it applies to any kind of competitive situation and, as we shall see, 
it has led to a unified approach to solving problems of statistical inference. 

To introduce some of the basic concepts of the theory of games, let us begin 
by explaining what we mean by a zero-sum two-person game. In this term, 
“two-person” means that there are two players (or, more generally, two parties 
with conflicting interests), and “zero-sum” means that whatever one player loses 
the other player wins. Thus, in a zero-sum game there is no “cut for the house" 
as in professional gambling, and no capital is created or destroyed during the 
course of play. Of course, the theory of games also includes games which are 
neither zero-sum nor limited to two players, but as can well be imagined, such 
games are generally much more complicated. Exercise 19 on page 319 is an 
example of a game which is not zero-sum. 

Games are also classified according to the number of strategies (moves, 
choices, or alternatives) each player has at his disposal. For instance, if each 
player has to choose one of two alternatives (as in Example 9.1), we say that it 
is a 2 x 2 game; if one player has 3 possible moves while the other has 4, the 
game is 3 x 4 or 4 x 3, as the case may be. In this section we shall consider only 
finite games, that is, games in which each player has only a finite, or fixed, number 
of possible moves, but later we shall consider also games where each player has 
infinitely many moves. 

It is customary in the theory of games to refer to the two players as Player 
A and Player B as we did in the table above, but the moves (choices, 
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or alternatives) of Player A are usually labeled 1, II, III,..., instea 
4,, 05, A3, ..., and those of Player В are usually labeled 1,2,3,..., in 
01, Ө›, Ө,,.... The payoffs, the amounts of money which change hands wh 
players choose their respective strategies, are usually shown in a table lil 
on page 309, which is referred to as a payoff matrix in the theory of games, | 
before, positive payoffs represent losses of Player А and negative payoffs ге) 
losses of Player B.) Let us also add that it is always assumed in the thi 
games that each player must choose his strategy without knowing wha 
opponent is going to do, and that once a player has made his choice, it 
be changed. 

The objectives of the theory of games are to determine optimum str: 
(namely, strategies which are most profitable to the respective players) а 
corresponding payoff, which is called the value of the game. ul 


EXAMPLE 9.3 


Given the 2 x 2 zero-sum two-person game 


Player A 
I П 


Strategy І, since a loss of $8 is obviously preferable to a loss of 
Au value of the game, the payoff corresponding to Strategies I 8 
2:15 68. А i 
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EXAMPLE 9.4 


Given the 3 x 2 zero-sum two-person game 


Player A 
I II IH 


T -4 1 7 
Player B 


find the optimum strategies of Players A and B and the value of the game. 


Solution 


In this game neither strategy of Player B dominates the other, but the third 
strategy of Player A is dominated by each of the other two—clearly, a profit 
of $4 or a loss of $1 is preferable to a loss of $7, and a loss of $4 or a 
loss of $3 is preferable to a loss of $5. Thus, we can discard the third column 
of the payoff matrix and study the 2 x 2 game 


Player A 
I II 


үл 
Player В 
EN 


where now Strategy 2 of Player B dominates Strategy 1. Thus, the optimum 
choice of Player B is Strategy 2, the optimum choice of Player A is Strategy 
II (since a loss of $3 is preferable to a loss of $4), and the value of the 


game is $3. A 


The process of discarding dominated strategies can be of great help in the 
solution of a game (that is, in finding optimum strategies and the value of the 
her than the rule that it will lead to a complete 


game), but it is the exception гаї ‘ 
solution. Dominances may not even exist, as is illustrated by the following 3 x 3 


zero-sum two-person game: 
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So, we must look for other ways of arriving at optimum strategies. From the 
point of view of Player A, we might argue as follows: If he chooses Strategy I, 
the worst that can happen is that he loses $2; if he chooses Strategy II, the worst 
that can happen is that he loses $6; and if he chooses Strategy III, the worst that 
can happen is that he loses $12. Thus, he could minimize the maximum loss by 
choosing Strategy I. 

Applying the same kind of argument to select a strategy for Player B, we 
find that if he chooses Strategy 1, the worst that can happen is that he loses $2; 
if he chooses Strategy 2, the worst that can happen is that he wins $2; and if he 
chooses Strategy 3, the worst that can happen is that he loses $6. Thus, he could 
minimize the maximum loss (or maximize the minimum gain, which is the same) 
by choosing Strategy 2. 

The selection of Strategies I and 2, appropriately called minimax strategies 
(or strategies based on the minimax criterion), is really quite reasonable. By 
choosing Strategy I, Player A makes sure that his opponent can win at most $2, 
and by choosing Strategy 2, Player B makes sure that he will actually win this 
amount. This $2 is the value of the game, which means that the game favors 
Player B, but we could make it equitable by charging Player B $2 for the privilege 
of playing the game, and giving the $2 to Player A. 

A very important aspect of the minimax strategies I and 2 of this example 
is that they are completely *spyproof" in the sense that neither player can profit 
from knowing the other's choice. In our example, even if Player A announced 
publicly that he will choose Strategy I, it would still be best for Player B to 
choosé Strategy 2, and if Player B announced publicly that he will choose Strategy 
2, it would still be best for Player A to choose Strategy I. Unfortunately, not all 
games are "spyproof." 


. 


Show that the minimax strategies of Players A and B are not spyproof in the 
following game: 
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Player A 
I П 
1 8 E55 
Player B 
2 2 6 


Solution 


Player A can minimize his maximum loss by choosing Strategy II, and 
Player B can minimize his maximum loss by choosing Strategy 2. However, 
if Player A knew that Player B was going to base his choice on the minimax 
criterion, he could switch to Strategy I and thus reduce his loss from $6 to 
$2. Of course, if Player B discovered that Player A would try to outsmart 
him in this way, he could in turn switch to Strategy 1 and increase his gain 
to $8. In any case, the minimax strategies of the two players are not spyproof, 
thus leaving room for all sorts of trickery or deception. A 


There exists an easy way of determining for any given game whether minimax 
strategies are spyproof. What we have to look for are saddle points, namely, pairs 
of strategies for which the corresponding entry in the payoff matrix is the smallest 
value of its row and the greatest value of its column. In Example 9.5 there is no 
saddle point since the smallest value of each row is also the smallest value of its 
column. On the other hand, in the game of Example 9.3 there is a saddle point 
corresponding to Strategies І and 2 since 8, the smallest value of the second row, 
is the greatest value of the first column. Also, the 3 x 2 game of Example 9.4 
has a saddle point corresponding to Strategies II and 2 since 3, the smallest value 
of the second row, is the greatest value of the second column, and the 3 x 3 
game on page 312 has a saddle point corresponding to Strategies I and 2 since 
2, the smallest value of the second row, is the greatest value of the first column. 
In general, if a game has a saddle point it is said to be strictly determined, and 
the strategies corresponding to the saddle point are spyproof (and, hence, 
optimum) minimax strategies. The fact that there can be more than one saddle 
point in a game is illustrated in Exercise 1 on page 316; it also follows from that 
exercise that it does not matter in that case which of the saddle points is used 
to determine the optimum strategies of the two players. 

If a game does not have a saddle point, minimax strategies are not spyproof, 
and each player can outsmart the other if he knows how his opponent will react 
in a given situation. To avoid this possibility, it suggests itself that each player 
should somehow mix up his behavior patterns intentionally, and the best way of 
doing this is by introducing an element of chance into the selection of his strategy. 
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EXAMPLE 9.6 


With reference to the game of Example 9.5, suppose that Player A uses a gambling | 
device (dice, cards, numbered slips of paper, a table of random numbers) which 


leads to the choice of Strategy I with probability x, and to the choice of Strategy 
П with probability 1 — x. Find the value of x which will minimize Player A's Чч 


maximum expected loss. 


Solution 


If Player B chooses Strategy 1, Player A can expect to lose 


Е = 8x — 5(1 – x) 


dollars, and if Player B chooses Strategy 2, Player A can expect to lose 


Е = 2x + 6(1 - x) 


dollars. Graphically, this situation is described in Figure 9.1, where we 


have plotted the lines whose equations are E = 8x — 5(1 — x) and E = 
2x + 6(1 — x) for values of x from 0 to 1. 

Applying the minimax criterion to the expected losses of Player A, 
we find from Figure 9.1 that the greater of the two values of E for any 
given value of x is smallest where the two lines intersect, and to find the 
corresponding value of x we have only to solve the equation 


8x — 5(1 — x) = 2x + 6(1 — x) 


which yields x = 17. Thus, if Player A uses eleven slips of paper numbered 
I and six slips of paper numbered II, shuffles them thoroughly, and then 
acts according to which kind he randomly draws, he will be holding his 
maximum expected loss down to 8 · 1 — 5 = 32 or $3.41 to the nearest 
cent. A 


So far as Player B of the preceding example is concerned, in Exercise 14 


on page 318 the reader will be asked to use a similar argument to show that 
Player B will maximize his minimum gain (which is the same as minimizing his 
maximum loss) by choosing between Strategies 1 and 2 with respective prob- 
abilities of ту and 13, and that he will thus assure for himself an expected gain 
of 3;5 or $3.41 to the nearest cent. Incidentally, the $3.41 to which Player A can 
hold down his expected loss and Player B can raise his expected gain is called 
the value of this game. Also, if a player's ultimate choice is thus left to chance, 
his overall strategy is referred to as randomized or mixed, whereas the original 
Strategies I, II, 1, and 2 are referred to as pure. 
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Figure 9.1 Diagram for Example 9.6. 


The examples of this section were all given without any “physical” interpre- 
tation because we were interested only in introducing some of the basic concepts 
of the theory of games. If we apply these methods to Example 9.1, we find that 
the “game” has a saddle point and that the manufacturer’s minimax strategy is 
to delay expanding the capacity of his plant. Of course, this assumes, questionably 
so, that Nature (which controls whether there is going to be a recession) is a 
malevolent opponent. Also, it would seem that in a situation like this the manager 
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ought to have some idea about the chances for a recession, and, hence, that the 
problem should be solved by the first method of Section 9.1. 


THEORETICAL EXERCISE 


1. If azero-sum two-person game has a saddle point corresponding to Strategies 
I and 4 and another corresponding to Strategies 111 and 2, show that 
(a) there are also saddle points corresponding to Strategies I and 2, and 
Strategies III and 4; 
(b) the payoff must be the same for all four of these saddle points. 


APPLIED EXERCISES. 


2. With reference to Example 9.1 on page 306, what decision would minimize 
^ а Р 
the manufacturer's expected loss if he felt that 
(a3 the odds for a recession are 3 to 2; 
(b) the odds for a recession are 7 to 4? 


3. With referenceto Example 9.1 on page 306, would the manufacturer's decision 
remain the same if 


(a) the $164,000 profit is replaced by a $200,000 profit and the odds are 2 
to 1 that there will be a recession; 

(b) the $40,000 loss is replaced by a $60,000 loss and the odds are 3 to 2 
that there" will be a recession? 


‚ Ms. Cooper is planning to attend a convention in Honolulu, and she must 
send in her room reservation immediately. The convention is so large that 
the activities are held partly in Hotel X and partly in Hotel Y, and Ms. 
Cooper does not know whether the particular session she wants to attend 
will be held at Hotel X or Hotel Y. She is planning to stay only one night, 
which would cost her $66.00 at Hotel X and $62.40 at Hotel Y, and it will 


Cost her an extra $6.00 for cab fare if she stays at the wrong hotel. 


(a) If Ms. Cooper feels that the odds are 3 to 1 that the session she wants 


to attend will be held at Hotel X, where should she make her reservation 
$0 as to minimize her expected cost? 


(b) If Ms. Cooper feels that the odds are 5 to 1 that the session she wants 
to attend will be held at Hotel X, where should she make her reservation 
80 as to minimize her expected cost? 

5. A truck driver has to deliver a load 
sites, which are, respectively, 
has misplaced the order tellin 
two construction sites are 1 


of lumber 4o one of two construction 
27 and 33 miles from the lumberyard, but he 
8 him where the load of lumber should go. The 
2 miles apart, and, to complicate matters, the 


10. 


11. 
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telephone at the lumberyard is out of order. Where should he go first. if he 

wants to minimize the distance he can expect to drive and he feels that 

(a) the odds are 5 to 1 that the lumber should go to the construction site 
which is 33 miles from the lumberyard; 

(b) the odds are 2 to 1 that the lumber should go to the construction site 
which is 33 miles from the lumberyard; 

(c) the odds are 3 to 1 that the lumber should go to the construction site 
which is 33 miles from the lumberyard? 


. Basing their decisions on pessimism as in Example 9.2 on page 308, where 


should 
(a) Ms. Cooper of Exercise 4 make her reservation; 
(b) the truck driver of Exercise 5 go first? 


. Basing their decisions on optimism (that is, maximizing maximum gains or 


minimizing minimum losses), what decisions should be reached by 
(a) the manufacturer of Example 9.1 on page 306; 

(b) Ms. Cooper of Exercise 4; 

(c) the truck driver of Exercise 5? 


. Suppose that the manufacturer of Example 9.1 on page 306 is the kind of 


person who always worries about losing out on a good deal. For instance, 
he finds that if he delays expansion and economic conditions remain good, 
he will lose out by $84,000 (the difference between the $164,000 profit he 
would have made if he had decided to expand right away, and the $80,000 
profit he will actually make). Referring to this quantity as an opportunity loss, 
or regret, find R 

(a) the opportunity losses corresponding to the other three possibilities; 
(b) which decision would minimize the manufacturer's maximum loss of 

opportunity. 


. With reference to the definition of Exercise 8, find which decisions will 


minimize the maximum opportunity loss of 


(a) Ms. Cooper of Exercise 4; 
(b) the truck driver of Exercise 5. 


With reference to Example 9.1 on page 306, suppose that the manufacturer 
has the option of hiring an infallible forecaster for $15,000 to find out for 
certain whether there will be a recession. Based on the original 2 to 1 odds 
that there will be a recession, would it be worthwhile for the manufacturer 
to spend this $15,000? 

Each of the following is the payoff matrix (the payments Player A makes to 
Player B) for a zero-sum two-person game. Eliminate all dominated strategies 
and determine the optimum strategy for each player as well as the value of 


the game: 
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(a) (b) 


(с) 


12. Each of the following is the payoff matrix of a zero-sum two-person game. 


Find the saddle point (or saddle points) and the value of each game: 


13. A small town has two service stations, which share the town's market for 


14. 


gasoline. The owner of Station A is debating whether or not to give away 
free glasses to her customers as part of a promotional scheme, and the owner 
of Station B is debating whether or not to give away free steak knives. They 
know (from similar situations elsewhere) that if Station A gives away free 
glasses and Station B does not give away free steak knives, Station A's share 
of the market will increase by 6 percent; if Station B gives away free steak 
knives and Station A does not give away free glasses, Station B's share of 
the market will increase by 8 percent; and if both stations give away the 
respective items, Station B's share of the market will increase by 3 percent. 
(a) Present this information in the form of a payoff table, in which the 
entries are Station A's losses in its share of the market. 
(b) Find optimum strategies for the owners of the two stations. 


Verify the probabilities of ; and 8 given on page 314 for the randomized 
strategy of Player B. 
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15. The following is the payoff matrix of a 2 x 2 zero-sum two-persUfi game: 


(a) What randomized strategy should Player A use so as to minimize his 
maximum expected loss? 
(b) What randomized strategy should Player B use so as to maximize his 
minimum expected gain? 
(c) What is the value of the game? 
16. With reference to Exercise 4, which randomized strategy will minimize Ms. 
Cooper's maximum expected cost? 


17. A country has two airfields with installations worth $2,000,000 and 
$10,000,000, respectively, of which it can defend only one against an attack 
by its enemy. The enemy, on the other hand, can attack only one of these 
airfields and take it successfully only if it is left undefended. Considering 
the “payoff” to the country to be the total value of the installations it holds 
after the attack, find the optimum strategy of the country as well as that of 
its enemy, and the value of the “game.” 


18. Two persons agree to play the following game: The first writes either 1 or 4 
on a slip of paper and at the same time the second writes either 0 or 3 on 
another slip of paper. If the sum of the two numbers is odd, the first wins 
this amount in dollars; otherwise, the second wins $2. 

(a) Construct the payoff matrix in which the payoffs are the first person’s 
losses. 

(b) What randomized decision procedure should the first person use so as 
to minimize her maximum expected loss? 

(c) What randomized decision procedure should the second person use so 
as to maximize his minimum expected gain? 


19. There are two gas stations in a certain block, and the owner of the first station 
knows that if neither station lowers its prices, he can expect a net profit of 
$100 on any given day. If he lowers his prices: while the other station does 
not, he can expect a net profit of $140; if he does not lower his prices but 
the other station does, he can expect a net profit of $70; and if both stations 
participate in this “price war," he can expect a net profit of $80. The owners 
of the two gas stations decide independently what prices to charge on any 
given day, and it is assumed that they cannot change their prices after they 


discover those charged by the other. 
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(a) Should the owner of the first gas station charge his regular prices or 
should he lower them, if he wants to maximize his minimum net profit? 

(b) Assuming that the above profit figures apply also to the second gas 
station, how might the owners of the gas stations collude so that each 
could expect a net profit of $105? 

Note that this "game" is not zero-sum, so that the possibility of collusion 

opens entirely new possibilities. 


93 STATISTICAL GAMES 


In statistical inference we base decisions about populations on sample data, and 
it is by no means far-fetched to look upon such an inference as a game between 
Nature, which controls the relevant feature (or features) of the population, and 
the person (scientist, or statistician) who must arrive at some decision about 
Nature's choice. For instance, if we want to estimate the mean ш of a normal 
population on the basis of a random sample of size n, we could say that Nature 
has control over the "true" value of д. On the other hand, we might estimate ш 
in terms of the value of the sample mean or that of the sample median, and 
presumably there is some penalty or reward which depends on the size of our error. 

In spite of the obvious similarity between this problem and the ones of the 
preceding section, there are essentially two features in which statistical games 
are different. First, there is the question which we already met when we tried to 
apply the theory of games to the decision problem of Example 9.1 on page 306, 
namely, the question of whether it is reasonable to treat Nature as a malevolent 
opponent. Obviously not, but this does not simplify matters; if we could treat 
Nature as a rational opponent, we would know, at least, what to expect. 

The other distinction is that in the games of Section 9.2 each player had to 
choose his strategy without any knowledge of what his opponent had done or 
was planning to do, whereas in a statistical game the statistician is supplied with 
sample data which provide him with some information about Nature's choice. 
This also complicates matters, but it merely amounts to the fact that we are 
dealing with more complicated kinds of games. To illustrate, let us consider the 
following decision problem: We are told that a coin is either balanced with heads 
on one side and tails on the other or two-headed. We cannot inspect the coin, but 
we can flip it once and observe whether it comes up heads or tails. Then we must 
decide whether or not it is two-headed, keeping in mind that there is a penalty of 
$1 if our decision is wrong, and no penalty (or reward) if our decision is right. If 
we ignored the fact that we can observe one flip of the coin, we could treat the 
problem as the following game: 
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Player A (The Statistician) 
a, a 


Marn o Ца, 6) = 0 TEN 


( Nature) 


6, Ца, 0) —1 | Lla, 2) = | 


which should remind the reader of the scheme on page 309. Now, Ө, is the "state 
of Nature" that the coin.is two-headed, @, is the "state of Nature" that the coin 
is balanced with heads on one side and tails on the other, a, is the statistician's 
decision that the coin is two-headed, and a; is the statistician's decision that the 
coin is balanced with heads on one side and tails on the other. The entries in 
the table are the corresponding values of the given loss function. 

Now let us consider also the fact that we (Player A, or the statistician) 
know what happened in the flip of the coin; that is, we know whether a random 
variable x has taken on the value x — 0 (heads) or x = 1 (tails). Since we shall 
want to make use of this information in choosing between d, and a,, we need 
a function, a decision function, which tells us what action to take when x ='0 
and what action to take when x = 1. One possibility is to choose a, when x = 0 
and a, when x = 1, and we can express this symbolically by writing 


Ш 


0 
1 


a, when x 


а(х) -f 


a when x 


or more simply d,(0) = а, and d,(1) = a;. The purpose of the subscript is to 
distinguish this decision function from others, for instance, from 


d,(0) = a, and 401) = а, 

which tells us to choose a, regardless of the outcome of the experiment, from 
d,(0) = а and d,(1) = а; 

which tells us to choose a; regardless of the outcome of the experiment, and from 
d,(0) = а, and d,(1) = а 

which tells us to choose 42 when x = 0 and a, when x = пй 


То сотраге їһе merits of all these decision functions, let us first determine 
the expected losses to which they lead for the various strategies of Nature, namely, 
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the values of the risk function 
R(4, 6) = E(L[d(x), 6) 
where the expectation is taken with respect to the random variable X. Since the 
probabilities for x = 0 and x = 1 are, respectively, 1 and 0 for 0,, and 3 and } 


for @,, we get 


R(d, 0) = |. L(a,, 6) +0. L(az, 0,) = 1-0+0-1=0 


Rds, 62) = 3 - 12,0) +4 - Ца,,0)=}.1+1.0-! 
R(d2, 6) = 1: Ца, 6,) +0. L(a,8) =1-0+0-0=0 
b-1+)-1=1 


К(4,, 06) =}. L(a,, ө) +4 · L(a,, 0) = 
R(d;,0,)=1- L(a, 6) + 0- L(a,, 6,) = 1-1+0-1=1 
R(ds, 6) = 3: Laz, 6) +4 - (az, 0,)= 1.04 3:0 

К(4,, 6) = 1- L(a,, 6) +0. L(a,0)-1-140-0-1 
R(d,, 6) 21. Laz, 6) +}: L(a,6) 21.0 1 pu =} 


where the values of the loss function were obtained from the table on page 321. 
. We have thus arrived at the following 4 x 2 zero-sum two-person game, in 
Which the payoffs are the corresponding values of the risk function: 


Player A (The Statistician) 
d, d; d; 


d, 
Player B d EN D. 
(Nature) 
^ 


inadmissible, Actually, this should not come as a surprise, since in d, as well as 
Е AIT alternative а, (that the coin js two-headed) even though it came 
p tails. 

This leaves us with the 2 x 2 zero-sum two-person game in which Player 
A has to choose between d, and d,. It can easily be verified that if Nature is 
looked upon as a malevolent Opponent, the optimum Strategy is to randomize 
between d, and d, with Tespective probabilities of 2 апа 3, and the value of the 
Bame (the expected risk) js 3 of a dollar. If Nature is not looked upon as à 
malevolent opponent, some other criterion will have to be used for choosing 
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between d, and d}, and this will be discussed in the sections which follow. 
Incidéntally, we formulated this problem with reference to a two-headed coin 
and an ordinary coin, but we could just as well have formulated it more abstractly 
as a decision problem in which we must decide on the basis of a single observation 
whether a random variable has the Bernoulli distribution (see Section 5.3) with 
the parameter 0 = 0 or the parameter 6 = 3. 

To illustrate further the concepts of a loss function and a risk function, let 
us consider the following example, in which Nature as well as the statistician 
has a continuum of strategies: 1 


EXAMPLE 9.7 


A random variable has the uniform density 


1 

p fr0<x<0 
f(x) 249 

0 


п elsewhere 


and we want to estimate the parameter 0 (the “move” of Nature) on the basis 
of a single observation. If the decision function is to be of the form d(x) = kx, 
where k > 1, and the losses are proportional to the absolute value of the errors, 


that is, 


L(kx, 0) = с|кх — 6| 
where c is a positive constant, find the value of k which will minimize the risk. 


Solution 


For the risk function we get 


ө/к 1 
f «0 - 19 bade | 


о [// 


k 1 
e(:- 1 +i) 


g we can do about the factor 6, but it can easily be 


1 
R(d, 0) PERESO de 


1 


and there is nothin, 


k 1 В 
verified that k = У2 will minimize 5 — 1 + т. Thus, if we actually took 


the observation and got x — 5, our estimate of 6 would be 5/2 or approxi- 


mately 7.07. A 
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9.4 DECISION CRITERIA 


In Example 9.7 we were able to find a decision function which minimized the 
risk regardless of the true state of Nature (that is, regardless of the true value of 
the parameter 8), but this is the exception rather than the rule. Had we not 
limited ourselves to decision functions of the form d(x) — kx, then the decision 
function given by d(x) = 6, would be best when 6 happens to equal 6,, the one 
given by d(x) = 0, would be best when @ happens to equal 0,,..., апа it is 
obvious that there can be no decision function which is best for all values of 6. 

In general, we thus have to be satisfied with decision functions that are best 
only with respect to some criterion, and the two criteria which we shall study in 
this chapter are: (1) the minimax criterion, according to which we choose the 
decision function d for which R(d, 0), maximized with respect to 0, is a minimum; 
and (2) the Bayes criterion, according to which we choose the decision function 
d for which the Bayes risk E[R(d, Ө)] is a minimum, where the expectation is 
taken with respect to Ө. This requires that we look upon Ө as a random variable 
having a given distribution. 

It is of interest to note that in the example of Section 9.1 we used both of 
these criteria. When we quoted odds for a recession, we assigned probabilities 
to the two states of Nature, 0, and Ө, and when we suggested that the manufac- 
turer minimize his expected loss, we suggested, in fact, that he use the Bayes 
criterion. Also, when we asked on page 308 what the manufacturer might do if 
he were a confirmed pessimist, we suggested that he would protect himself against 
the worst that can happen by using the minimax criterion. 


9.5 THE MINIMAX CRITERION 


If we apply the minimax criterion to the illustration of Section 9.3, dealing with 
the coin which is either two-headed or balanced with heads on one side and tails 
on {Не other, we find from the table on page 322 with d; and d, deleted that for 
d, the maximum risk is }, for d; the inaximum risk is 1, and, hence, the one that 
minimizes the maximum risk is d;. 


EXAMPLE 9.8 


Use the minimax criterion to estimate the parameter 0 of a binomial distribution 
on the basis of a value of the random variable X, the observed number of successes 
in n trials, when the decision function is of the form 


х+а 
n+b 


d(x) = 
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where a and b are constants, and the loss function is given by 


х+а х+а 2 
L z д 
(e) (US e) 


where c is a positive constant. 


Solution 


The problem is to find the values of a and b which will minimize the 
corresponding risk function after it has been maximized with respect to ө. 
After all, we have control over the choice of a and b, while Nature (our 
presumed opponent) has control over the choice of 6. 

Since E(x) = пө and E(x’) = n6(1— 0 + n0), as we saw on page 
180, it follows that 


xta 2 
R(d, 0) = (2-е) | 


a ghee. 
~ (n+ bY 


[6°(Ь° — n) + O(n — 2ab) + a^] 


and, using calculus, we could find the value of 6 which maximizes this 
expression, and then minimize R(d, 8) for this value of 0 with respect to 
a and b. This is not particularly difficult, but it will be left to the reader in 
Exercise 4 on page 329 as it involves some tedious algebraic detail. A 


To simplify the work in a problem of this kind, we can often use the equalizer 
principle, according to which (under fairly general conditions) the risk function 
of a minimax decision rule is a constant; for instance, it tells us that in Example 
9.8 the risk function should not depend on the value of Ө.' To justify this principle, 
at least intuitively, observe that in Example 9.6 the minimax strategy of Player 
A leads to an expected loss of $3.41 regardless of whether Player B chooses 
Strategy 1 or Strategy 2% 

To make the risk function of Example 9.8 independent of 6, the coefficients 
of 6 and 6? must both equal 0 in the expression for к(а, 0). This yields Б -п= 0 
and n — 2ab = 0, and, hence, a = iVn and b = Jn. Thus, the minimax decision 


function is given by 


d(x) х+}уп 
желу 
n4 n 
eS 
* The exact conditions under which the equalizer principle holds are given in the 
book by T. S. Ferguson listed among the references at the end of this chapter. 
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and if we actually obtained 39 successes in 100 trials, we would estimate the 


39+ 5 


is bi ial distributi ee ш 0.40. 
parameter of this binomial distribution as 100 4*10 


9.6 THE BAYES CRITERION 


To apply the Bayes criterion in the illustration of Section 9.3, the one dealing 
with the coin which is either two-headed or balanced with heads on one side 
and tails on the other, we will have to assign probabilities to the two strategies 
of Nature, Ө, and 6;. If we assign Ө, and 0z, respectively, the probabilities p and 
1 — p, it can be seen from the table on page 322 that for d, the Bayes risk is 


0-р+5:(1- р) -3-(1- p) 
and that for d, the Bayes risk is 
1:pt0-(1-p)-p 


It follows that the Bayes risk of d, is less than that of d; (and d, is to be preferred 
to d;) when p > 3, and that the Bayes risk of d; is less than that of d, (and d; 
is to be preferred to d,) when p — 3. When p = 3, the two Bayes risks are equal, 
and we can use either d, or d;. 


EXAMPLE 9.9 


With reference to Example 9.7, suppose that the parameter of the uniform density 
is looked upon as a random variable with the probability density 


0:e€* for@>0 
0 elsewhere 


h(8) -{ 


If there is no restriction on the form of the decision function and the loss function 
is quadratic, that is, its values are given by 


L[d(x), 6] = c(d(x) – өр 


find the decision function which minimizes the Bayes risk. 


Sec. 9.6.: The Bayes Criterion 327 


Solution 


Since @ is now a random variable, we look upon the original density function 
as the conditional density 


1 
fixie) = 16 for0<x < 6 
0 


elsewhere 


and, letting f(x, 6) = f(x|8) - h(0) in accordance with Definition 3.13 on 
page 123, we get 


е6  for0cx«0 
0) = 
Aen [ elsewhere 
As the reader will be asked to verify in Exercise 6 on page 329, this yields 


ays ee for x > 0 
8 0 elsewhere 


for the marginal density of x and 


Ca hare x 


e(0lx) = |, 


elsewhere 
for thé conditional density of Ө given x = x. 


Now, the Bayes risk E[R(d, 0)] which we shall want to minimize is 
given by the double integral 


[ i c[d(x) — ey f(x|o) api d6 
0 0 
which can also be written as 
| TN c[d(x) = oF e(4|x) a s) dx 
0 x 


making use of the fact that f(x|@) > (8) = e(0|x) - g(x) and changing the 
order of integration. To minimize this double integral, we must choose d(x) 
for each x so that the integral 


F c[d(x) — ey'e(e|x) de = | c[d(x) – 0} e*** de 


x 


328 


Chap. 9: Decision Theory 


is as small as possible. Differentiating with respect to d(x) and putting the 
derivative equal to 0, we get 


2ce*- [tae = 0]е7° 40 = 0 


х 


This yields 


and, finally, 


Thus, if the observation we get is x — 5 (as on page 323), this decision 
function gives the Bayes estimate 5 + 1 = 6forthe parameter of the original 


uniform density. A 
" 


THEORETICAL EXERCISES 


1. With reference to the illustration on page 320, show that even if the coin is 
flipped n times, there are only two admissible decision functions. Also 
construct a table showing the values of the risk function corresponding to 
these two decision functions and the two states of Nature. 


2. With reference to Example 9.7,-show that if the losses are proportional to 
the squared errors instead of their absolute values, the risk function becomes 


2 
R(d, 6) = “(г -3k +3) 


and its minimum is at k = 3. 


3. A statistician has to decide on the basis of a single observation whether the 
parameter 6 of the density 


Т F for0<x<@ 


0 elsewhere 
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equals 0, ог 0, where 6, < 4. If he decides on 0, when the observed value 
is less than the constant К, on 0, when the observed value is greater than or 
equal to the constant k, and he is fined C dollars for making the wrong 
decision, which value of k will minimize the maximum risk? 

. Find the value of 0 which maximizes the risk function of Example 9.8, and 
then find the values of a and b which minimize the risk function for that 
value of 6. Compare the results with those given on page 325. 

. If we assume in Example 9.8 that 0 is a random variable having the uniform 
density of Section 6.2 with a — 0 and В = 1, show that the Bayes risk is 
given by 


es id - n) + Kn — 2ab) + a7) 


Also show that this Bayes risk is a minimum when a = 1 and b = 2, so that 
x*1 


the optimum Bayes decision rule is given by d(x) = auo 
. Verify the results given on page 327 for the marginal density of x and the 
conditional density of Ө given x = x. 


. Suppose that we want to estimate the parameter 0 of the geometric distribution 
of Section 5.5 on the basis of a single observation. If the loss function is 
given by 


L[d(x),0] = с{а(х) - ө} 
and Ө is looked upon as а random variable having the uniform density 
h(0)=1 for 0< 0 < 1 and h(0) = 0 elsewhere, duplicate the steps in 
Example 9.9 to show that 


(a) the conditional density of Ө given x = x is 


x(x + 1001 —0)*  fo0c8- 1 
e(obo = 0 elsewhere 


(b) the Bayes risk is minimized by the decision function 


2 
d(x) = zu 


( Hint: Make use of the fact that the integral of any beta density is equal to 1.) 
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APPLIED EXERCISES 


8. A statistician has to decide on the basis of one observation whether the 


10. 


parameter 0 of a Bernoulli distribution is 0, 2, or 1; his loss in dollars (a 

penalty which is deducted from his fee) is 100 times the absolute value of 

his error. 

(a) Construct a table showing the nine possible values of the loss function. 

(b) List the nine possible decision functions and construct a table showing 
all the values of the corresponding risk function. 


(c) Show that five of the decision functions are not admissible, and that 
according to the minimax criterion the remaining decision functions 
are all equally good. 

(d) Which decision function is best according to the Bayes criterion, if the 
three possible values of the parameter 6 are regarded as equally likely? 


A statistician has to decide on the basis of two observations whether the 

parameter ô of a binomial distribution is 4 or }; his loss (a penalty which is 

deducted from his fee) is $160 if he is wrong. 

(a) Construct a table showing the four possible values of the loss function. 

(b) List the eight possible decision functions and construct a table showing 
all the values of the corresponding risk function. 

(c) Show that three of the decision functions are not admissible. 

(d) Find the decision function which is best according to the minimax 
criterion. Ш 

(e) Find the decision function which is best according to the Bayes criterion, 
if RRS assigned to @ = 1 and 0 = } are, respectively, 
3 and 5. 


А manufacturer produces an item consisting of two components, which must 
both work for the item to function properly. The cost of returning one of the 
items to the manufacturer for repairs is а dollars, the cost of inspecting one 
of the components is В dollars, and the cost of repairing a faulty component 
is e dollars. He can ship each item without inspection with the guarantee 
that it will be put into perfect working condition at his factory in case it does 
not work; he can inspect both components and repair them if necessary; or 
he can randomly select one of the components and ship the item with the 
original guarantee if it works, or repair it and also check the other component. 
(a) Construct a table showing the manufacturer's expected losses corre- 
sponding to his three “strategies” and the three “states” of Nature that 
0, 1, or 2 of the components do not work. 
(b) What should the manufacturer do if a = $25.00, 9 = $10.00, and he 
wants to minimize his maximum expected losses? 
(c) What should the manufacturer do to minimize his Bayes risk if a — 
$10.00, В = $12.00, Ф = $30.00, and he feels that the probabilities for 
0, 1, and 2 defective components are, respectively, 0.70, 0.20, and 0.10? 
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10.1 


10 


Point Estimation 


INTRODUCTION 


Traditionally, statistical inference has been divided into problems of estimation 
and the testing of hypotheses, and we shall continue to make this distinction 
mainly to facilitate the organization of the material which we want to present in 
this book. Let us point out, though, that all problems of statistical inference, 
namely, all problems in which we make generalizations on the basis of sample 
data, are essentially decision problems, and, hence, can be handled by a unified 
approach like that presented in Chapter 9. The main distinction is that in problems 
of estimation we must choose a value of a parameter (that is, we must choose 
one particular strategy of Nature) from a possible continuum of alternatives, 
while in the testing of hypotheses we must decide whether to accept or reject 
one specified value or a set of specified values of a parameter. 

As we already pointed out on page 324, perfect decision functions do not 
exist, and this is another reason for distinguishing between problems of estimation 
and the testing of hypotheses—the methods which we are willing to accept as 
"second best" or the restrictions which we must impose to obtain optimum 
decision functions differ somewhat for the two kinds of problems. 


332 
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10.2 POINT ESTIMATION 


If we use the value of a statistic to estimate a population parameter, this value 
is a point estimate of the parameter. For example; if we use a sample mean to 
estimate the mean of a population, a sample proportion to estimate the parameter 
0 of a binomial population, or a sample variance to estimate the variance of a 
population, we are in each case using a point estimate of the parameter in 
question. These estimates are called point estimates because they are single 
numbers, or points on the real axis, used, respectively, to estimate и, 0, and o°. 

The statistic, whose value is used as the point estimate of a parameter, is 
called an estimator. Therefore, the statistic x is an estimator of и, and its value 
X is the point estimate. Similarly, the statistic s^ is an estimator of c, and its 
value s? is the point estimate. 

Since estimators are random variables, one of the key problems of point 
estimation is to study their sampling distributions. For instance, when we estimate 
the variance of a population on the basis of a random sample, we can hardly 
expect that the value of s? we get will actually equal c^, but it would be reassuring, 
at least, to know whether we can expect it to be Ses Also, if we must decide 
whether to use a sample mean or a sample median to estimate the parameter of 
a population, it would be important to know, among other things, whether X or 
X is more likely to yield a value which is actually close. 

Various statistical properties of estimators can, thus, be used to decide 
which estimator is most appropriate in a given situation, which will expose us 
to the smallest risk, which will give us the most information at the lowest cost, 
and so forth. The particular properties of est:.aators which we shall discuss in 
Sections 10.3 through 10.5 are unbiasedness, minimum variance, consistency, relative 
efficiency, and sufficiency. 


10.3 UNBIASED ESTIMATORS 


Since there can be no perfect estimator which always gives the right answer, it 
would seem reasonable that an estimator should do so at least on the average. 
In other words, it would seem desirable that the expected value of an estimator 
equal the parameter which it is supposed to estimate. If this is the case, the 
estimator is said to be unbiased; otherwise, it is said to be biased. Formally, 


DEFINITION 10.1 А statistic Ө is an unbiased estimator of the parameter 0 


if and only if E(6) = 6. 
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According to this definition, it follows from Theorems 8.1 and 8.5 that x is 
an unbiased estimator of и for any population whose mean exists. The following 
are further examples of unbiased as well as biased estimators: 


EXAMPLE 10.1 


If x has a binomial distribution, show that x/n, the observed proportion of 
successes, is an unbiased estimator of the parameter 6. 


Solution 


Since E(x) — n6, it follows that 


1 
Е(®) -L.E(x)m Лб үе 0. А 
n n n 
EXAMPLE 10.2 
Show that the minimax estimator of the binomial parameter 0 given on page 325 
is biased. 
Solution 


Since E(x) = n6, it follows that in general 


xtiVn| E(xctiVn +4 
e( vt) (x^ уп) пө CLEAR 


n Vn ntn — п+уп 


We can now explain why we divided by n — 1 and not by n when we 
defined the sample variance—it makes s? an unbiased estimator of o? for random 
samples from infinite populations. 


THEOREM 10.1 If з? is the variance of a random sample from an infinite 
population, then E(s?) = o°. 


Proof. By Definition 8.2, 


E(s) = [= PIE | 
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efi f(x; - ш) - (x - wr] 


fi E((x; — 4) - п. E(x — n] 


fen aay 


2 
Then, since E((x, — )*] = o? and E[(x – и)?] = e it follows that 


Although s? is an unbiased estimator of the variance of an infinite popula- 
tion, it is not an unbiased estimator of the variance of a finite population, and 
in neither case is s an unbiased estimator of c. The bias of s is discussed among 
others in the book by E. S. Keeping listed on page 363. 

If we have to choose one of several unbiased estimators, we usually take 
the one whose sampling distribution has the smallest variance. We already 
indicated this on page 302, although we did not mention at the time that the 
sample median is also an unbiased estimator of the mean и of a normal popula- 
tion. To check whether a given unbiased estimator has the smallest possible 
variance, namely, whether it is a minimum variance unbiased estimator, we make 
use of the fact that if Ө is an unbiased estimator of 6, it can be shown under very 
general conditions (referred to on page 363) that the variance of Ө must satisfy 


the inequality 
xL LAE EE 
2 
^ z| (12549) | 
90 


where f(x) is the value of the population density at x, and n is the size of the 
random sample. This inequality, the Cramér-Rao inequality, leads to the following 
result: 


var(6) > 


THEOREM 102 If 6 is an unbiased estimator of 6 and 


var(6) = 


then 6 is a minimum variance unbiased estimator of 6. 
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Here, the quantity in the denominator is referred to as the information about 


9 which is supplied by the sample (see also Exercise 11 on page 346). Thus, the 
greater the variance, the less the “information.” 


EXAMPLE 10.3 


Show that x is a minimum variance unbiased estimator of the mean и of a normal 
population. 


Solution 


Since 


f(x) = 


1 SE 
Бр» бэт for—o <x < со 
it follows that 


2 
Inf(x) = -InevZz — (625) 


с 


ѕо that 


ln f(x) _ (z - £) 
ðu = c с 


and, hence, 


е] аа] 


1 1 а? 

ЕЕЕ а азс омо 
1 2 

bis BC Ar | e = ; 

Ou [74 
H Y =) о? 
and since x is unbiased andvar(x) = — according to Theorem 8.1, it follows 

n 


that X is a minimum variance unbiased estimator of д. Г 
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It would be erroneous to conclude from this example that X is a minimum 
variance unbiased estimator of the mean y of every population. In Exercise 17 
on page 347 the reader will be asked to verify that this is not so for random 
samples of size n — 3 from the continuous uniform population defined on the 
interval from @ — } to 0 + }. 

As we have indicated, unbiased estimators are usually compared in terms 
of their variances. If 6, and 6, are two unbiased estimators of a parameter 0 and 
the variance of 6, is less than the variance of 6, we say that ô, i is relatively more 
efficient. Also, we use the ratio 


var(6,) 
var(6,) 


to measure the efficiency of ô, relative to 6,. 


EXAMPLE 10.4 


In estimating the mean џи of a normal population on the basis of a random 
sample of size 2n + 1, what is the efficiency of the median relative to the mean? 


Solution 
From Theorem 8.1 we know that X is unbiased and that 


2 


2n +1 


var(x) = 


So far as X is concerned, it is unbiased by virtue of the symmetry of the 
normal distribution about its mean, and we know from the discussion 
following Theorem 8.15 that for large samples 


2 


var(X) = ao 


Thus, for large samples, the efficiency of the median relative to the mean 
is approximately 


с 
var(x) 2л +1 _ 4п 
var(X) по? т(2п + 1) 
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and the asymptotic efficiency of the median with respect to the mean is 


4n 


DLE TIAE 
we rnc) т 


or about 64 percent. A 


The result of the preceding example may be interpreted as follows: For 
large samples, the mean requires only 64 percent as many observations as the 
median to estimate ш with the same reliability. 

It is important to note that we have limited our discussion of relative 
efficiency to unbiased estimators. If we included biased estimators, we could 
always assure ourselves of an estimator with zero variance by letting its values 
equal the same constant regardless of the data which we may obtain. Thus, if 6 
is not an unbiased estimator of the parameter 6, it is preferable to judge its merits 
and make efficiency comparisens on the basis of the mean Square error 


E[(8 — 0)?] 


instead of the variance of 6. 


10.4 CONSISTENT ESTIMATORS 


The idea of choosing a minimum variance unbiased estimator is closely related 
to that of minimizing the risk function when the loss function is quadratic (as 
in Example 9.8). Of course, there are other kinds of loss functions and other 
ways of measuring the chance fluctuations of statistics. The fact that the variance 
may not even provide a good criterion for this purpose is illustrated by the 
following example: Suppose that we want to estimate on the basis of one 
Observation the parameter 6 of the population 


Ifx - 9 
AC MON rd M 1 1 
fo) za Mus EE t — oy 


for =œ < x < oo and 0 < w < 1, Evidently, this density is a weighted mean of 
à normal density with the mean 6 and the variance с? and a Cauchy density (see 
Exercise 3 on page 217) with B = 1 and a = б, Now, if w is very close to 1, 
say, w = 1 — 1075, and о is very small, say, с = 10^". the probability that a 
random variable having this distribution will take on a value which is very close 
to 6, and hence is a very good estimate of 6, is practically 1. Yet, the variance 
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of this estimator exceeds any bound, which follows from the fact that the variance 
of the Cauchy distribution does not exist. 

The preceding example is somewhat out of the ordinary, but it suggests 
that we pay more attention to the probabilities of estimators taking on values 
close to the parameters they are supposed to estimate. The reader may recall that 
we already touched upon questions concerning the “closeness” of estimates in 
Sections 5.4 and 8.2. Basing our argument on Chebyshev’s theorem, we showed 
in Section 5.4 that when n > © the probability approaches 0 that the sample 


"ES s t : A б 
proportion r will take on a value which differs from the binomial parameter 0 


by more than any arbitrary constant c > 0. Also using Chebyshev's theorem, we 
showed in Section 8.2 that when n > © the probability approaches 0 that x will 
take on a value which differs from д, the mean of the population sampled, by 
more than any arbitrary constant c > 0. 

In both of the examples of the preceding paragraph we were practically 
assured that, at least for large n, the estimators will take on values which are 
very close to the respective parameters. This concept of closeness is generalized 
in the following definition of consistency: 


DEFINITION 102 The statistic 6 is a consistent estimator of the parameter 
0 if and only if for each positive constant c, : 


lim P(J6 – 6| = с) = 0 


noc 


or, equivalently, if and only if 


lim P(à -ø< с) =1 


2 : x. А 
In accordance with this definition, we showed іп Section 5.4 that & is a consistent 


estimator of the binomial parameter 6, and in Section 8.2 that x is а consistent 
estimator of the mean ш of a population which has a finite variance. 

Note that consistency is an asymptotic property, namely, a limiting property 
of an estimator; informally, Definition 10.2 says that when n is sufficiently large, 
we can be practically certain that the error made with a consistent estimator will 
be less than any small preassigned positive constant. 

In actual practice, we can often judge whether an estimator is consistent 
by using the following sufficient conditions (though not necessary conditions), 
which are an immediate consequence of Chebyshev's theorem: 
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THEOREM 103 The statistic @ is a consistent estimator of the parameter 
0 if ) 


1. ĝis unbiased; 
2. var(6) э 0 аѕ лп ә о. 


To demonstrate that these conditions are not necessary conditions, we have onl 
to show that an estimator can be consistent without being unbiased. An examp 


+1 
page 347 the reader will be asked to verify that this estimator, E. 


a consistent estimator of the binomial parameter 6. We should add, though, th 

a biased estimator can be consistent only if it is asymptotically unbiased, namel: 
that it becomes unbiased when n > oc. For instance, the minimax estimator on 
page 325 is asymptotically unbiased since 


is, indi 


| „(== ү n6 t in 
im — | m limt — Ө 
mw Nn tn meee) aA 


EXAMPLE 10.5 


Show that the sample variance s? is a consistent estimator of o? for random 
samples from normal populations. 


Solution 


Since s? is unbiased according to Theorem 10.1, it remains to be shov 
that the variance of s? approaches 0 as n > оо. Referring to the result of 
Exercise 4 on page 295 (or Theorem 8.10 on which this exercise is based), 
we find that 


2o* 
n 


var(s?) — 
and, hence, that var(s*) > 0 as n > oo, so long as c is finite. A 


105 SUFFICIENT ESTI MATORS 


An estimator 6 is said to be sufficient if it utilizes all the information in a sample 
relevant to the estimation of the population parameter 0; that is, if all the 
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knowledge we can gain about 6 by actually specifying the sample values and 
their order can just as well be obtained by observing only the value of the 
statistic Ө. 

Formally, this can be expressed in terms of the conditional distribution 
(probability distribution or density) of the sample values given 6 = 6. This 
quantity is given by 


А ло does 
fesso Mor X 8 X, AES X2 d Xn) 
g(8) g(8) 


If it depends on 6, particular values of xi, X2, ... , and x, yielding 6 = 6 are 
more probable for some values of 6 than for others, and the knowledge of these 
sample values will help in the estimation of 6. If it does not depend on 6, particular 
values of x,, X», ..., and x, yielding Ө = @ are just as likely to occur for any 
value of Ө, and the knowledge of these sample values will not help in the estimation 
of 6. 


DEFINITION 10.3 The statistic ê is a sufficient estimator of the parameter 0 
if and only if for each value of 6 the conditional distribution of the random 
sample xı, X», ..., and x, given 6 = is independent of 6. 


EXAMPLE 10.6 


If xı, X», ..., and x, are independent Bernoulli random variables with the same 


parameter 6, show that the statistic @ = x/n, where x = Xy + X2 + ‘+ Xs is 
a sufficient estimator of 6. 


Solution 


By Definition 52, 
/(х; 0) = 0*(1— 0)^* — for x; = 0,1 
so that 
(Жу Жү «o Fn) -[pe*a- Ca 
i=l 
ix 
=6 (1—0) 
-8*(1-60)"* 
= 00 - gy 


п- їх 
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L 
for x; = 0, 1, andi = 1,2,..., п. Also, since x is a binomial random variabk 


with the parameters n and 6, its distribution is given by | 


b(x; n, 0) = (“or = 8)" 
х 
and the transformation of variable technique of Section 7.3 yields 
^ n | on п-б А 1 
= ^ - B 0:—...d 
g(8) ("Je (1 - 6) for = 


Now, substituting into the formula for f(x, , x», ... , x,|8) on page 341 we get 


(хуз; EET 8) f01,X,,..., Xn) 


g(8) g(8) 
ч pra — в)"—"ё 


n \ grê n-nô 
А _@ 
(a-o 


Г ) 
КЫ c-r хх 


for x, = 0,1, and i = 1,2,..., n, which evidently does not depend 01 4 


AUS 
We conclude, therefore, that 6 = ~ is а sufficient estimator of ө. А 
n 


EXAMPLE 10.7 


Show that the statistic y = 1х, + 2x; + 3x) is not sufficient for estimating U* 


parameter 0 of a Bernoulli population. 
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Solution 
We must show that 


fon, х, хз, Y) 


fo. х, xj») = 20) 


is not independent of 6 for some values of xı, X2, and хз. Thus, let us 
consider the particular case where x, = 1, x2 = 1, and x; = 0, so that 


FO aaa Np Maa О. 


yy f(1,1,0) 
= 7(1,1,0) + f(0,0, 1) 


where 


f, 31,99) о — өс" 


for x, = 0,1, and i = 1,2,3. Since /(1,1,0) = @7(1 — 0) and f(0,0,1) = 
(1 — 0), it follows that 
e - 0) 


fü,10y = 4) = = оу + 001 - 0) 


001—0) + (1 - 0)? Ti 


which depends on б. Therefore, the statistic y = (х1 + 2X2 + 3x;) is not : 


sufficient estimator of б. A 


ith Definition 10.3 whether a statisti 


As it can be quite tedious to check w 
it is usually easier to use th 


is a sufficient estimator of a given parameter, 
following factorization theorem: 


é is a sufficient estimator of the parameter 0 


THEOREM 104 The statistic 
ility distribution of the random 


if and only if the joint density or probabi 
sample can be factored so that 


PO a 0) = #(#, 0) © hGa 32, Ха) 


where g(6, 0) depends only on 6 and 6, and h(i, X»... Xn) does not 


depend on 6. 
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To illustrate the use of this theorem, a proof of which is given in the book ' 
by R. V. Hogg and A. T. Craig referred to on page 363, let us consider the 
following example: 


EXAMPLE 10.8 


Show that the statistic X is a sufficient estimator of the mean и of a normal 
population with the known variance o. 


Solution 


Making use of the fact that 


f05,X3,..., X85 4) = (y Me 


and that 
Х-и? È w- 2) (и - OF 
т-а ў е-и) 
= È œ- 9)? п-и) 
we get 


1/z-n\? 
fsa am Уп Ке 


ovm 
2 Y 
н (x, 
+ m 1 fi BiG | 
Уп \с/2т d 
where the first factor on the right-hand side depends only on the estimate 
х and the population mean џ, and the second factor does not involve №. 


Therefore, according to Theorem 10.4, x is a sufficient estimator of the mean 
ш of a normal population with the known variance o^. A 


With Definition 10.3 and Theorem 10.4 we have presented two ways of. 
checking whether а statistic 6 is a sufficient estimator of a given parameter 6. 
Usually, the factorization criterion of-Theorem 10.4 leads to the easier solution, 
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but to show that 6 is not sufficient, it is almost always simpler to proceed by 
Definition 10.3, as illustrated in Example 10.7. 

Let us conclude this section with a very important property of sufficient 
estimators. If 6 is a sufficient estimator of 8, then every single-valued function 
y = u(0), not involving 6, is also a sufficient estimator of Ө, and therefore of 
u(@), provided y = u(8) сап be solved to give the single-valued inverse @ = w(y). 
This result follows from Theorem 10.4, since we can write 


Уха, Xas -+s Xni 0) = gw); Ө] - һ(ху,х»,...›%) 


where g[w(y), 61 depends only on y and 6. If we apply this result to Example 
10.6, where we showed that ó =" is a sufficient estimator of the Bernoulli 


parameter 6, it follows that the estimator X = Xi Tx XQ is also a 
sufficient estimator of the ‘binomial mean ш = n6. 


THEORETICAL EXERCISES 


1. Use the formula for the sampling distribution of X on page 300 to show that 
for random samples of size 3 the median is an unbiased estimator of the 
parameter 6 of the uniform density with a = 0 — landg-0-* і. 

2. Refer to Example 8.4 to show that for random samples of size n = 3 the 
median is a biased estimator of the parameter 0 of an exponential population. 


3. Given a random sample of size n from a population which has the known 
; А qe. 1 : 
mean p and the finite variance a°, show that “H ЖЕ - ш)? is an unbiased 
i=1 


estimator of o^. 

4. Show that if 6 is an unbiased estimator of 0 and var(0) does not equal 0, 
then 67 is not an unbiased estimator of 6°. 

5. Use the results of Theorem 8.1 to show that X" is an asymptotically unbiased 
estimator of ш. 

6. Show that the Bayes estimator of Exercise 5 on page 329 is biased. 

7. If 6 is an estimator of a parameter 6, its bias is given by b = E(0) - 6. Show 
that E[(6 — 0)] = var(8) + b?. 

8. For what value of k is 6 = kx an unbiased estimator of the parameter 0 of 
the population given by 


for0 <x « 0 


f(x) = 


o ol- 


elsewhere? 
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10. 


11. 


12. 


13. 


er XS SON E ; з МУ 
Show that the sample proportion e is a minimum variance unbiased estimator | 


| 
of the binomial parameter 6. | 


| 


Show that the mean of a random sample of size n is a minimum variance | 
unbiased estimator of the parameter A of a Poisson population. 


The information about 0 in a random sample of size n is also given Бу | 


| 
ә? 1 ) 
дй, [тл | 


where f(x) is the value of the population density at x, provided that the 
extremes of the region for which f(x) # 0 do not depend on 6. The derivation 
of this formula takes the following steps: 


(a) Differentiating the expressions on both sides of 


f f(x)dx = 1 
with respect to 6, show that 


aln f(x) L 
| 207004 = 0 


by interchanging the order of integration and differentiation. 
(b) Differentiating again with respect to 6, show that 


8InfG)Y] _ fa info) 
d ED )]- ef 96? ] 


Use the alternative formula for the information given in the preceding exercise 
to rework Example 10.3. 


If X, is the mean of a random sample of size n from a normal population 
with the mean и and the variance c1, and X, is the mean of a random sample 
of size n from a normal population with the mean џ and the variance 9? 
show that 

(а) w-X + (1 —w) + %,whered < w < 1, is an unbiased estimator of J^ 
(b) the variance of this estimator is a minimum when 


о% 
==- 
oi + о? 


14. 


15. 


16. 


17. 


18. 
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If x, and X; are the means of independent random samples of size п, and 
п, from a normal population with the mean 4 and the variance a’, show 
that the variance of the unbiased estimator и. Xj + (1 — w): X; is а 
n 


minimum when w — с 

ntn 
If xı, X2, and хз are a random sample from a normal population with the 
mean p and the variance c^, what is the efficiency of the estimator Ё = 


x, + 2x. + X E 
xı T 02 T relative to x? 


$ i д 4 ^ x ^ х+1 d 
If x is a binomial random variable and 0, = P and 6; = ГЕ are esti- 


mators of the parameter 6, for what values of 6 is El (6, — @)?] less than 
E[(6, — €? 

Since the variances of the mean and the mid-range are not affected if the 
same constant is added to each observation, we can determine these variances 
for random samples of size 3 from the uniform population 


1 070021 сос 
у " elsewhere 


by referring instead to the uniform population 


1 {ог0<х<1 
у(х, P elsewhere 


(a) Show that E(x) =}. E(x?) = }, and var(x) = 4, for this distribution, 
so that for a random sample of size 3, var(X) = x. 

(b) Use the results of Exercise 2 on page 302 and part (b) of Exercise 6 on 
page 302 (or derive the necessary densities and joint density) to show 
that for a random sample of size three from this distribution the order 
statistics y, and уз have E(yı) = 1 Ely?) = ds Ely) = % E(y3) = $, 
and (угуз) = $, so that var(yı) = à, var(ys) = фб, and соу(уз, Уз) = 
1 


80° 
(c) Use the results of part (b) and Theorem 4.14 to show that 


yty)! (M24) - 5 
gus ) = and var 2 40 


This shows that for random samples of size п = 3 from the given uniform 
population the mid-range is more efficient than the mean. 

Show that the estimator of Exercise 5 on page 329 is a consistent estimator 
of the parameter 8 of a binomial population. 
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19. Show that the estimator of Exercise 13 is consistent. 


20. Show that the mean of а random sample of size п from an exponential 
population is a consistent estimator of its parameter 6. 


21. Suppose that we use the largest value of a random sample of size n (namely, 
the order statistic Yn) to estimate the parameter 6 of the population 


1 

= for0<x<@ 
f(x) =4 0 

0 


elsewhere 


Check whether this estimator is (a) unbiased, and (b) consistent. 


22. If xy, x;, and X; are independent random variables having Bernoulli distribu- 
tions with the same parameter 6, show that y = X; + 2X; + X, is not a 
sufficient estimator of Ө. (Hint: Consider special values of x,, X2, and x;.) 


23. If x, and X; are independent random variables having, respectively, binomial 
dy ke rr : X; + X; 
distributions with the parameters @ and n, and Ө and m, show that TU. 
ntm 

is a sufficient estimator of Ө. 


24. If x, and X; are independent random variables ‘having Poisson distributions 
with the same parameter A, show that their mean is a sufficient estimator 


25. If xy, x;,... „and x, is a random sample of size n from a geometric population, ] 


n 
show that y — У x, isa sufficient estimator of its parameter 6. 
i=} 


26. Show that the estimator of Exercise 3 is a sufficient estimator of the variance 
of a normal population with the known mean [a 


27. With reference to Exercise 20 show that the sample mean is a sufficient 
estimator of the exponential parameter Ө. 


106 THE METHOD OF MOMENTS 
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as possible of the properties discussed in the preceding sections. In this section 
and the next we shall treat two such methods, the method of moments, historically 
one of the oldest methods of estimation, and the method of maximum likelihood. 
Some further discussion of Bayesian estimation will be given in Section 10.8, and 
another method, the method of least squares, will be taken up in Chapter 14. 
The method of moments consists of equating the first few moments of a 
population to the corresponding moments of a sample, thus getting as many 
equations as are needed to solve for the unknown parameters of the population. 


DEFINITION 104 The kth sample moment of a set of observations x, , X2, . . . , 
and x, is the mean of their kth powers and it is denoted by т; symbolically, 


my = uk К = 1,2,...,р 


for the p parameters of the population. 


EXAMPLE 10.9 


Given a random sample of size n from a gamma population, use the method of 
moments to estimate its parameters с and f. 


Solution 


The system of equations we shall have to solve is 
т = pi and т = p, 
Since и! = aß and uj = a(a + 1)8? according to Theorem 6.2, we get 
m, = aB and т = о(а + Dg? 


and, solving these two equations for a and В, we find that the estimates of 
the two parameters of the gamma distribution are 


(m a ті = (т)? 
d = and License uem 
T m= (mi)? Р т\ 
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z 
Xx 
where m; = X and т; = "—— Thus, in terms of the original observations 
S 2 
22 У (x; — X) 
^ nx i=l 
а = and В s 
z “Жа nx 
У (х – x) 


and we get the corresponding estimators by substituting x, for x, and x 
for X. A 


In the example above we estimated the parameters of a specific population. 
It is important to note, however, that when the parameters to be estimated are 
the moments of the population, then the method of moments can be used without 
knowledge of the exact functional form of the population. 


10.7 THE METHOD OF MAXIMUM 


LIKELIHOOD 


Clearly, it must be two or three, and assuming the missing letter to have been a 
raridom selection, we obtain probabilities of 


respectively, for getting the observed data. Therefore, if we choose as our estimate 
of the total number of credit card billings the value which maximizes the probabil- 
ity of the observed data, we get k = 3. We call this estimate a maximum likelihood 
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estimate, and the method by which it was obtained the method of maximum 
likelihood. 

Thus, the essential feature of the method of maximum likelihood is that we 
look at the values of a random sample and then choose as our estimate of the 
unknown population parameter the value for which the probability of obtaining 
the observed data is a maximum. If the observed sample values аге ху, X», ..., 
and x,, we can write in the discrete case 


Р(х, = X1, X; = х›,...,Х„ = X) = f(X, Xs... Ха; 0) 


which is just the value of the joint probability distribution of the random variables 
Xi, X2, . .. , and x, at the sample point (x,, X», ... , Xn). Since the sample values 
have been observed and are therefore fixed numbers, we regard 
fX, X5, . . . Xn; 0) as the value of a function of the parameter Ө, referred to as 
the likelihood function. A similar definition applies when the random sample 
comes from a continuous population, but in that case f(x1, X»... , Xy; 8) is the 
value of the joint probability density at the sample point (xi, х2,..., x. 


DEFINITION 10.5 If ху, X, ..., and x, are the values of a random sample 
from a population with the parameter 6, the likelihood function of the sample 


is given by 


0) = /(х\,х›,...,%„; 0) 


for values of Ө within a given domain. Here f(xi, Х2,..., Xn; 0) is the value 
of the joint probability distribution or joint density function of the random 
variables X; , X2, ..., and x, at the observed sample point. 


Thus, the method of maximum likelihood consists of maximizing the likelihood 
function with respect to 6, and we refer to the value of 0 which maximizes the 
likelihood function as the maximum likelihood estimate of 6. 


EXAMPLE 10.10 


Given x "successes" in n trials, find the maximum likelihood estimiator of the 
parameter 6 of the binomial distribution. 


Solution 
To find the value of 0 which maximizes 


L(0) = b(x; п, 6) = u ea = ey7* 
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it will be convenient to make use of the fact that the value of 0 whi 
maximizes L(0) will also maximize 


In £(0) = w(7) +x-In@+(n—x)-In(1 — 0) 


Thus, we get 


d[(InL(0) x n-x 


d6 8" 1-90 


and, equating this derivative to 0 and solving for 6, we find that the like 


x А -—— 
hood function has a maximum at 0 = zi Hence, the maximum likeliho 


z1x 


estimator of the parameter @ of the binomial distribution is ô = 


EXAMPLE 10.11 


If ху, X2,..., and x, are the values of a random sample from an exponenti 
population, find the maximum likelihood estimate of its parameter 6. 


Solution 
Since the likelihood function is given by 
L(0) = f(x1, x2, ..., x4; 0) 
E I f(x; 6) 


SM LG) 


6 


differentiation of In L(@) with respect to 6 yields 


ms. d 
SEE ates 
c'e 2% 


d[In L(6)] _ 
Poe 


Equating this derivative to zero and solving for 6, we get the maximum. 
likelihood estimate 
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Let us also consider an example in which the methods of elementary calculus 
cannot be used to find the maximum value of the likelihood function. 


EXAMPLE 10.12 


If Xis Xaser and x, are the values of a random sample from a continuous. 
uniform population with a — 0 and B — 0, find the maximum likelihood 
estimator of 6. 


Solution 


The likelihood function is given by 


це) = fesso = (5) 


for 6 greater than or equal to the largest x, and 0 otherwise. Since the values 
of this function increase when 0 decreases, we must make @ as small 
as possible, and it follows that the maximum likelihood estimator of 0 is 
6:= yn, the nth order statistic. a 


The method of maximum likelihood can also be used for the simultaneous 
estimation of several parameters of a given population, in that case we must find 
the values of the parameters which together maximize the likelihood function. 


EXAMPLE 10.13 


Given a random sample of size n from a normal population with the mean и 
and the variance c^, find joint maximum likelihood estimates of these two 


parameters. 


Solution 
Since the likelihood function is given by 


Lu, о?) = П по; m 0 
ist 


1 ә us i (x 7 AY 
e M Er 
(= =) 
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partial differentiation of In (д, o?) with respect to u and c? yields 


2 1 n 
йиде] E E De iu) 
ди 9" i=) 
and 
alln (и, с?)] п л 2 
eee et (а 
дс? 2027420 (хл) 


Equating the first of these two partial derivatives to zero and solving for и, 
we get 


й = 


3|- 


n 
“х= 
i=} 


and equating the second of these partial derivatives to zero and solving for 
а? after substituting и = x, we get 


eat. Y(x-X? A 
n - 


Since ó does not equal s? in the preceding example, we find that maximum 
likelihood estimators need not be unbiased. It should also be observed that we 
did not prove that 2 is a maximum likelihood estimate of c, but only that 6? is 
a maximum likelihood estimate of o°. However, it can be shown (see reference 
on page 363) that maximum likelihood estimators have the invariance property 
that if 6 is a maximum likelihood estimator of @ and the function given by g(@) 


is continuous, then g(8) is also a maximum likelihood estimator of g(@). From 
this we can conclude that 


is also a maximum likelihood estimate of с. 
Although we maximized the logarithm of the likelihood function instead 
of the likelihood function, itself, in all of the examples in which we were able 


to use calculus, this is by no means necessary; it so happened that it was convenient 
in each case. 


THEORETICAL EXERCISES 


1. Use the method of moments to find an estimator for the parameter 6 of the 
uniform density of Example 10.12. 
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„ If ху, хо, 5:5 and x, are the values of a random sample of size n from à 


population having the density 


2(0 — x) 
fo8)-41 € 
0 elsewhere 


for0<x<9 


find an estimator for 0 by the method of moments. 


If xi, X2, s^, and X, are the values of a random sample of size n from a 


Poisson population with the parameter À, find an estimate of A using 


(a) the method of moments; 
(b) the method of maximum likelihood. 


| 1E xx, and x, are the values of a rando:n sample of size n from a beta 


population with B = 1, find-an estimate of the parameter а using 


(a) the method of moments; 
(b) the method of maximum likelihood. 


. Given a random sample of size n from a normal population with the known 


mean д, find the maximum likelihood estimator of о. 


. IE 1,2... and x, are the values of a random sample of size n from a 


geometric population, find an estimate of its parameter Ө using 


(a) the method of moments, 
(b) the method of maximum likelihood. 


. Given a random sample of size n from a population having the density 


-(x-8) y 

e forx > 0 
apy 

fex 9) |, elsewhere 


find an estimator of the parameter @ by the method of maximum likelihood. 


. Given a random sample of size n from a population having the density 


y (6 + 1)x" for0<x<1 
fus 95 ? elsewhere 


find an estimator of Ө using 


(a) the method of moments; 
(b) the method of maximum likelihood. 


. Among N independent random variables having identical binomial distribu- 


tions with the parameters @ and n = 2, Mo take on the value 0, n, take on. 


the value 1, and п: take on the value 2. Find an estimate of Ө using 


(a) the method of moments; 
(b) the method of maximum likelihood. 
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10. Given a random sample of size n from a gamma population with the known 
parameter a, find 
(a) the maximum likelihood estimator of В; 
(b) the maximum likelihood estimator of т = (28 — 1)". 


11. Given a random sample of size n from a population having a uniform density, 
find simultaneous maximum likelihood estimators of the parameters a and f. 


12. Given a random sample of size n from a population having the density 


1 

—,е forx » à 
/(х; б,ф) 21e 

0 


elsewhere 


where —с© < 8 < оапӣ0 < p < ©, find simultaneous maximum likelihood 
estimates of ô and g. 


13. Given independent random samples x,,x,...,X,, and Yi Y2, - - -, Yn from 
two normal populations having the means ш = a + B and u, = a — Вапа 
the common variance c^ = 1, find simultaneous maximum likelihood 
estimators for а and f. 


10.8 BAYESIAN ESTIMATORS 


So far we have assumed in this chapter that the parameters which we want to 
estimate are unknown constants; in Bayesian estimation the parameters are looked 
upon as random variables having prior distributions, usually reflecting the strength 
of one’s belief about the possible values they can assume. In Section 9.6, we 
already met a problem of Bayesian estimation—the parameter was that of the 


Ы d 1 i 
uniform density whose values are a for the interval from 0 to 6, and 0 elsewhere, 


and its prior distribution was a gamma distribution with a = 2 and B = 1. 

The main problem of Bayesian estimation is that of combining prior feelings 
about a parameter with direct sample evidence, and in Example 9.9 we accom- 
plished this by determining ¢(6|x), the conditional density of Ө given х = x. In 
contrast to the prior distribution of @, this conditional distribution which also 
reflects the direct sample evidence is called the posterior distribution of Ө. In 
general, if h(8) is the value of the prior distribution of Ө at 0, and we want to 
combine the information which it conveys with direct sample evidence about 6, 
say, the value of a statistic w = u(x,, x», ... ,Х,), we determine the posterior 
distribution of 6 by means of the formula 


fO, w) _ Һ(Ө) - f(w|8) 


elw) = 
(olw) Sti) gw) 
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Неге f(w|@) is a value of the sampling distribution of w given Ө = 6, /(6, w) is 
a value of the joint distribution of 6 and w, and g(w) is a value of the marginal 
distribution of w. Note that the above formula for ¢(6|w) is, in fact, an extension 
of Bayes’ theorem, Theorem 2.13, to the continuous case; hence, the term 
“Bayesian estimation.” 

Once the posterior distribution of a parameter has been obtained, it can be 
used to make estimates, as in Example 9.9, or it can be used to make probability 
statements about the parameter, as will be illustrated in Example 10.15. Although 
the method we have described has extensive applications, we shall limit our 
discussion here to inferences about the parameter Ө of a binomial population 
and the parameter p of a normal population; inferences about the parameter 
of a Poisson population are treated in Exercise 4 on page 362. 


THEOREM 105 Ifx isa binomial random variable and the prior distribution 
of @ is a beta distribution with given values of a and £, then the posterior 


distribution of Ө given х = x is a beta distribution with the parameters 
xt+aandn-x+t+f8. | 


Proof. For6 = 0 we have 
f(xl8) = () 9'(1-8)* огх = 0,1,2,...,п 


Г(а + В) . o~a- 0)! #0г0<0<1 
h(0) =4 Г(а) · T(8) 
0 elsewhere 


and, hence, 


У, х) = (а + В). 0°=1(1 — 6) x (" oa- ey 


= T(a)- (9) ; 
t [Res I(a + B) .gx*e(1- 8 n-x*B-1 
5 () PR ВИ 


forü0 < 8 < land x = 0,1,2,..., п, and /(0, x) = 0 elsewhere. To obtain 
the marginal density of x, let us make use of the fact that the integral of 
the beta density from 0 to 1 equals 1, namely, that 


a sS HB IS 
j^ е & Peg) 
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Thus, we get 


YE T(a * B) „D(a + х): T(n - x + B) 
g) 7A.) T T(B) T(n +a +В) 


for x = 0,1,..., and, hence, 


T(n + a + 8) 


‚ gx*a-l(4 _ gyn-x*B-1 
T(a + x) F(n- x + B) ? (inie) 


e(8lx) = 


for 0 < 0 < 1, and ¢(6|x) = 0 elsewhere. As can be seen by inspection, 
this is a beta density with the parameters x + а and n – x + f. Y 


To make use of this theorem, let us refer to the result that (under very 
general conditions) the mean of the posterior distribution minimizes the Bayes 
risk when the loss function is quadratic, namely,-when the loss function is given 
by 


L[d(x), Ө] = c[d(x) - 0} 


where c is a positive constant. Note that this is the kind of loss function which 
we used in Example 9.9. Since the posterior distribution of Ө of Theorem 10.5 
is a beta distribution with the parameters x + а and n — x + f, it follows from 
Theorem 6.5 that 


xta 
FD BIER 
is a value of an estimator of Ө which minimizes the Bayes risk when the loss 
function is quadratic and the prior distribution of 6 is of the given form. 


EXAMPLE 10.14 


Find the mean of the posterior distribution as an estimate of the “true” probability 
of a success, if 42 successes are obtained in 120 binomial trials and the prior 
distribution of Ө is a beta distribution with a = В = 40. 


Solution 


Substituting x = 42, n = 120, а = 40, and В = 40 into the above formula 
for E(0|x), we get 


42 + 40 
Е а а eS 
(942) 40 + 40 + 120 pal 
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Note that without knowledge of the prior distribution of 6, the minimum 
variance unbiased estimate of 6 (see Exercise 9 on page 346) would be the 
sample proportion 


THEOREM 106 If xis the mean of a random sample of size п from a normal 
population with the known variance а?, and the prior distribution of p is 
a normal distribution with the mean po and the variance с>, then the 
posterior distribution of p given x = X is a normal distribution with the 
mean ш, and the variance a1, where 


пс? + boo 
Ел па? + o° с 


оо и тт 


Proof. Forp = p We have 
vn 
(дш) = 2g * 


according to Theorem 8.4, and 


һи) = E ° 


so that 
hlu) /(%\н) 
OAN EAS t 
eulx) E 
х-к meu p 7 Ho 5 
g Уп 3 Ga) ( % ) for-o < p < © 
2maocsg(X) 


Now, if we collect powers of шіп the exponent of е, we get 


and if we let 
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EXAMPLE 10.15 


| 
1 $ 

factor out BEL and complete the square, the exponent of e in the 
Ti 


expression for e(u|X) becomes 


1 = 
лүш. SAER 
20; 


where R involves n, X, uo, c, and со, but not д. Thus, the posterior 
distribution of p becomes 


1 М 
Уп et t agp Ar) 


(niž) = for =% < ш < 0 


2«cosg(X) Е 


which is easily identified as а normal distribution with the mean 4, and 
the variance o1. Hence, it can be written as 


des) 


"e for -o < u < œ 


(ulz) = —! 
uM ek oc 


where ш, and т, are defined above. Note that we did not have to determine 
g(X) as it was absorbed in the constant in the final result. v 


A distributor of soft drink vending machines feels that in a supermarket one of 
his machines will sell on the average цо = 738 drinks per week. Of course the 
mean will vary somewhat from market to market, and the distributor feels that 
this variation is measured by the standard deviation a = 13.4. So far as a machine 
placed in a particular market is concerned, the number of drinks sold will vary 
from week to week, and this variation is measured by the standard deviation 
v = 42.5. Ifone of the distributor's machines put into anew supermarket averaged 
X = 692 during the first ten weeks, what is the probability (the distributor's 


personal probability) that for this market the value of р. is actually between 700 
and 720? Ы 


Solution 


Assuming that the population sampled is approximately normal and that 
it is reasonable to treat the prior distribution of н. as a normal distribution 
with the mean uo = 738 and the standard deviation оо = 13.4, substitution 
into the two formulas of Theorem 10.6 yields | 


_ 10 - 692(13.4)? + 738(42.5)? 7i5 
==—————————= 
10(13.4)? + (42.5)? 


1 
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and 
Hie tthe quads = = 0.0111 
cl (4215) (134 


so that a? = 90.0 and g, = 9.5. Now, the answer to our question is given 
by the area of the shaded region of Figure 10.1, namely, the area under the 
standard normal curve between 


700 -71 ig 
EB. ue Io e 


9.5 9.5 0.53 


Thus, the probability that р is between 700 and 720 is 0.4429 4- 0.2019 — 
0.6448, or approximately 0.64. A 


00 
z= -1.58 z = 0.53 


Figure 10.1 Posterior distribution of д. 


THEORETICAL EXERCISES 
ults of Exercise 21 on page 219, show that the mean of 


1. Making use of the res 


the posterior distribution of Ө given on page 358 can be written as 


Е) = w-— + (1 = w) ` 8 


namely, as a weighted mean of is and 6,, where 4 and a} are the mean and 


riance of the prior beta distribution of 6, and 


the va 
ee 
"n iE 901 — bo) 3% 


% 
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2. In example 10.14, the prior distribution of the parameter Ө of the binomial 


distribution was a beta distribution with the parameters a = B = 40. Use 
Theorem 6.5 to find the mean and the variance of this prior distribution and 
describe its shape. 


. Show that the mean of the posterior distribution of p given in Theorem 10.6 


can be written as 
fy = wrk + (1-и) до 
namely, as a weighted mean of X and шо, where 


n 
w= 
а? 
п 
To 


. If x has a Poisson distribution and the prior distribution of its parameter À 


is a gamma distribution with the parameters œ and f, show that 
(а) the posterior distribution of A given x = x is a gamma distribution with 
the parameters а + x and Ыр 
В +1 
(b) the mean of this posterior distribution of А is 


PL Tx) 


Ш В +1 


APPLIED EXERCISES 


5. The output of a certain transistor production line is checked daily by inspecting 


a sample of 100 units. Over a long period of time, the process has maintained 

a yield of 80 percent, namely, a proportion defective of 20 percent, and the 

variation of the proportion defective from day to day is measured by a standard 

deviation of 0.04. If on a certain day the sample contains 38 defectives, find 

the mean of the posterior distribution of Ө as an estimate of that day's 

He defective. Assume that the prior distribution of 6 is a beta distri- 
ution. 


. Records of a university (collected over many years) show that on the average 


74 percent of all incoming freshmen have I.Q.'s of at least 115. Of course, the 
percentage varies somewhat from year to year and this variation is measured 
by a standard deviation of 3 percent. If a sample check of 30 freshmen entering 
the university in 1986 showed that only 18 of them have 1.Q.’s of at least 115, 
estimate the true proportion of students with I.Q.’s of at least 115 in that 
freshman class using 


(a) only the prior information; 
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(b) only the direct information; 
(c) the results of Exercise 1 to combine the prior information with the direct 
information. 


7. With reference to Example 10.15, find P(712 < p < 725|x = 692). 


8. A history professor is making up a final examination which is to be given to 
a very large group of students. His feelings about the average grade they 
should get is expressed subjectively by a normal distribution with the mean 
Ho = 65.2 and the standard deviation ag = 1.5. 

(a) What prior probability does the professor assign to the actual average 
grade being somewhere on the interval from 63.0 to 68.0? 

(b) What posterior probability would he assign to this event if the 
examination is tried on a random sample of 40 students whose grades 
have a mean of 72.9 and a standard deviation of 7.4? Use s = 7.4 as an 
estimate of о. 

9. An office manager feels that for a certain kind of business the daily number 
of incoming telephone calls is a random variable having a Poisson distribution, 
whose parameter has a prior gamma distribution with а = 50 and В = 2. 
Being told that one such business had 112 incoming calls on a given day, what 
would be his estimate of that particular business' average daily number of 
incoming calls if he considers 
(a) only the prior information; 

(b) only the direct information; 

(c) both kinds of information and the theory of Exercise 4? 
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Interval Estimation | 


11.1 INTRODUCTION 


Since point extimates will rarely equal the parameters they are supposed to 
estimate, it is usually desirable to give ourselves some leeway by using interval 
estimates. An interval estimate of a parameter @ is an interval of the form 
6, <0 < 6, where 6, and 6; depend on the value taken on by the estimator Ө 
in a given sample and also on the sampling distribution of Ө. For instance, if we 
are asked to estimate the average I.Q. of a very large group of students on the 
basis of a random sample, we might arrive at the interval estimate 109 — p «MT 
on the basis of the sample mean X = 113 as well as knowledge about the sampling 
distribution of x. 

Since different samples will generally yield different values of 6 and, hence, 
different values of 6, and 6,, these endpoints of the interval are values of 
corresponding random variables Ө, and 6,. Based on the sampling distribution 
of Ө we can thus assert with a given probability whether such an interval will 
actually contain the parameter it is supposed to estimate. In other words, we can 
use the sampling distribution of 6 to choose 6, and 6, such that for any specified 
probability 1 — a, where 0 < a <1, 


P(6,<0<6,)=1-a 
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Such an interval 6, <0 < ê, computed for a particular sample, is called a 

(1 — а)100% confidence interval, the fraction 1 — о is called the confidence 

coefficient or the degree of confidence, and the endpoints 6, and 6, are called the 

lower and upper confidence limits. For instance, when o — 0.05 the degree of 
' confidence is 0.95 and we get a 95% confidence interval. 

Let us make it clear at this point that confidence intervals for given param- 
eters are not unique. This is illustrated by Exercises 2 and 3 on page 373, and 
also in Section 11.2, where we show that, based on a single random sample, there 
exist numerous confidence intervals for ш all having the same degree of 
confidence. As in the case of point estimation, methods of obtaining confidence 
intervals must, thus, be judged by their.various statistical properties. For instance, 
one desirable property is to have the length of a (1 — a)100% confidence interval 
as short as possible. Another desirable property is to have the expected length, 

E(6, i$, ), as small as possible. 


11.2 CONFIDENCE INTERVALS FOR MEANS 


Since X is a sufficient estimator of the mean of a normal population with the 
known variance c^ (see Example 10.8 on page 365), let us use it to derive а 
confidence interval for ш from such a population. By Theorem 8.4, the sampling 
distribution of x for a random saniple of size п from a normal population with 


the mean и and the variance o? is a normal distribution with the mean jg = и 
2 


| c ў 
and the variance o} = "5 Thus, we can write 


P(-z;€z2€2,5:71-« 


where 


X x 


а= уур 


and Zaz is such that the integral of the standard normal density from 2,2 to oo 
equals a/2. It follows that 


x = 


or, equivalently, 


366 Chap. 11: Interval Estimation 


and it can be seen that the random variables 6, and 6,, defined in Section 11.1, 
are 


^ 


с ^ д, с 
Sic and arc ааа 


For a given value of o, they depend only on known constants and the observed 
sample, and the lower and upper confidence limits can now be obtained by letting 
X take on its sample value X. Thus, we have shown that 


THEOREM 111 (Confidence interval for 4, с known) If X is the value of the 
mean of a random sample of size n from a normal population with the 
known variance с?, a (1 — @)100% confidence interval for и is given by 


2 co ý т 
ФИС а= ARSED ts 


By virtue of the central limit theorem, this result can also be used for random 
samples from non-normal populations with the known variance c^, provided п 
is sufficiently large; that is, when n > 30. 


EXAMPLE 11.1 


If a random sample of size п = 20 from a normal population with the variance 
а? = 225 has the mean X = 64.3, construct a 95% confidence interval for the 
population mean д. 


Solution 


For a = 0.05, we find from Table III that zo; = 1.96. Therefore, the 95% 
confidence interval for ш is 


15 15 
643 — 1.96:-—— < OO EI 
Tag 4643196073 


which reduces to 


517«p4 «1709 4 
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The interval which we obtained in the preceding example is a 95% 
confidence interval for the mean of the population. Of course, the double 
inequality 57.7 < ш < 70.9 must be either true or false, but if we had to bet, 95 
to 5 (or 19 to 1) would be fair odds that it is true. These odds are fair because 
the method by which the interval was obtained works, so to speak, 95% of the time. 

The fact that confidence intervals for given parameters are not unique is 
readily seen by writing the (1 — a)100% confidence interval of Theorem 11.1 as 


x SET Ше 
€ deer ЕЙ 
Жаз р НТ Баратат 
or as the one-sided (1 — a)100% confidence interval 
pl? с 
а па АҢ vn 


We could also have based the confidence interval, say, on the sample median 
instead of the sample mean. 

In order to construct an approximate (1 — a)100% confidence interval for 
ш When ø is unknown but n > 30, we replace с by the value of the sample 
standard deviation s and proceed as above. However, when we are dealing with 
a sample from a normal population and n « 30, a (1 — @)100% confidence 
interval for ш can be constructed by making use of the fact that the random variable 


к-р 
t7 sn 


has a t distribution with n — 1 degrees of freedom (see Theorem 8.12). Hence, 
P(-tejan-1 € tX 1/21) 71-8 
= 


where ta/2,n—1 is as defined on page 290, and substituting for t we get 
х-и 
Pl -taana < T= < 1, а) а 
( а/2,п-1 s/n /2,n-1 
or, equivalently, 
" s йө VIE ү. 
P\X — ta/2,n-1 "3e <и € XT атту 


This leads to the following small-sample confidence interval for ш: 
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THEOREM 11.2 (Confidence interval for u, o unknown) If х and s are the 
values of the mean and the standard deviation of a random sample of size 
n from a normal population with the unknown variance a”, a (1 — @)100% 
confidence interval for и is given by 


= 5 ы $ 
X = bay2,n-1 ym < м< X Etani PU 


For n > 30, this confidence interval formula and the one of Theorem 11.1 with 
с replaced by s will yield nearly the same results. 


EXAMPLE 11.2 


A paint manufacturer wants to determine the average drying time of a new interior. 
wall paint. If for 12 test areas of equal size he obtained a mean drying time of 
66.3 minutes and a standard deviation of 8.4 minutes, construct a 95% confidence 
interval for the true mean д. 


Solution 


Substituting X = 66.3, s = 8.4, and tos, = 2.201 (from Table IV), the 
95% confidence interval for и becomes | 


8.4 8.4 
66.3 — 2201: — < u < 66.3 + 2. == 
(^ 66.3 + 2.201 Jis 


or simply 
61.0 < u < 71.6 


This means that we can assert, with a 95% degree of confidence that the © 


interval from 61.0 minutes to 71.6 minutes contains the true average drying 
time of the paint. А 7 


The method by which we constructed confidence intervals in this section 
consisted essentially of finding a suitable random variable whose values are 
determined by the sample data as well as the population parameters, yet whose 
distribution does not involve the parameter we are trying to estimate. This was 


the case, for example, when we used the random variable z — ХЕ 


с/уп’ 


whose 
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values cannot be calculated without knowledge of и, but whose distribution for 
random samples from normal populations, the standard normal distribution, does 
not involve д. Although this method of obtaining confidence intervals, sometimes 
called the pivotal method, is very widely used, and we shall use it again in the 
next few sections, there exist more general methods, for instance, the one discussed 
in the book by Mood, Graybill, and Boes referred to on page 384. 


113 CONFIDENCE INTERVALS 
FOR DIFFERENCES BETWEEN MEANS 


In Exercise 4 on page 281 the reader was asked to show that if x, and X; are the 
means of independent random samples of size n, ue n; from normal populations 
having the means иш, and p, and the variances o and o3, then X, — X; is а 
random variable having a normal distribution with the mean 


Ha-& = Mi T M2 


and the variance 


It follows that 


(Xy —X)) — (ш = и) 


has a standard normal distribution. Substituting this expression for z into 


Р(-22р <2< 2.5) = - а 


the pivotal method leads to 


X X JE S eu - ZH 
P(X x) 227 EIS 1 2 


T р, 015002 
< (Ri = X) + za Pis za 
1 


and, hence, to the following confidence interval for ш, — #2: 
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THEOREM 113 (Confidence interval for p, — дь, тү and o; known) If x, 
and х, are the values of the means of independent random samples of size 
n, and n; from normal populations with the known variances oj and o5, 
a (1 — a)100?6 confidence interval for д, — 4; is given by 


a a сі, 03 ta ei, 03 
(X — X3) 7 Za/2" m < ш-р < (X 7 X) t 2,2 ' * 
"no TJ m ny 


By virtue of the central limit theorem, this result can also be used for independent 
random samples from non-normal populations with the known variances ст and 
o}, provided n, and n; аге sufficiently large, that is, when л, and л, > 30. 


EXAMPLE 11.3 


Construct a 94% confidence interval for the actual difference between the average 
lifetimes of two kinds of light bulbs, given that a random sample of 40 light bulbs 
of one kind lasted on the average 418 hours of continuous use and 50 light bulbs 
of another kind lasted on the average 402 hours. The population standard 
deviations are known to be g, = 26 and о; = 22. 


Solution 


For a = 0.06, we find from Table III that zo, = 1.88. Therefore, the 94% 
confidence interval for ду — ш; is 
(418 — 402) — 1.8- 4j — + — < ш – m 


26° | 22? 
< (418 — 402) + 1.88: 25 + 50 


which reduces to 


6.3 < a — po < 25.7 


Hence, we are 94% confident that the interval from 6.3 to 25.7 contains the 
true difference between the average lifetimes of the two kinds of light bulbs. 
The fact that both confidence limits are positive suggests that on the average 
the first kind of bulb is superior to the second kind. A 
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In order to construct a (1 — a)100% confidence interval for ш, — M2 when 
су and с, are unknown but п, and n; > 30, we replace с; and o; by the values 
of the sample standard deviations s; and s; and proceed as above. The procedure 
for estimating the difference between two means when c, and с; are unknown 
and the sample sizes are small is not straightforward unless the unknown standard 
deviations of the two normal populations are equal. If су = 0 = 0, then 


L5 6-4) = (Qu — 92) 
[1 1 
П лер A 
m п; 


is a random variable having a standard normal distribution and о? can be 
estimated by pooling the squared deviations from the means of the two samples. 
In Exercise 6 on page 374 the reader will be asked to show that the resulting 
pooled estimator 


(m = Usi + (m — 052 
mtnm-2 


2- 
Sp = 


is, indeed, an unbiased estimator of a”. Now, by Theorems 8.10 and 8.8, the 


= Usi = 1)s3 Е 1 
independent random variables Ка 00 апа (mas have chi-square distri- 


butions with n, — 1 and n; — 1 degrees of freedom, and their sum 


(m = )si , (m= )s$ (m*m- 2)s; 
тсе ы eae = 2 
а о [^4 


has a chi-square distribution with п, + n; — 2 degrees of freedom. As it can be 
shown that the above random variables z and y are independent (see references 
on page 384), it follows from Theorem 8.11 that 


t= 
| y 
ntn-2 


a (X, — žo) — (ш = во) 
m 
"Хт m 


has a t distribution with n, + т — 2 degrees of freedom. Substituting this 


expression for t into 


P(=ta2,n+m-2 © t © 1z/2 135-2) =a 


XN 
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and algebraically simplifying the result, we arrive at the following (1 — a)100% 
confidence interval for ш; — џ›: 


STARA NIMS. 8 ен VT 
THEOREM 114 (Confidence interval for шу — H2, су = с; unknown) If x, 
and x, are the values of the means of independent random samples of size 
n, and n; from normal populations with unknown but equal variances, a 
(1 — a)100?6 confidence interval for ш, — pz is given by 


“Уу 1 1 
(S2 UP 2 = lo/2 m *m-2* Sp m F m < Mi~ M 
1. 


€ (X1.— X) + fan m-2 * Sp Fi " " 


where s, is the square root of the value of the pooled estimator of the 
population variance given on page 371. 
pm 


EXAMPLE 11.4 


A study has been made to compare the nicotine contents of two brands of 
cigarettes. Ten cigarettes of Brand A had an average nicotine content of 3.1 
milligrams with a standard deviation of 0.5 milligram, while eight cigarettes of 
Brand B had an average nicotine content of 2.7 milligrams with a standard 
deviation of 0.7 milligram. Assuming that the two sets of data are random samples 
from normal populations with equal variances, construct a 95% confidence 
interval for the true difference in the average nicotine content of the two brands 
of cigarettes. 


Solution 


Let us summarize the data as follows: 


Brand A Brand B 
n = 10 п = 8 

х = 3.1 % = 27 
5 = 0.5 з = 0.7 


— 
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For a = 0.05 and n, + n; — 2 = 16 degrees of freedom, we find from Table 
IV that 16 = 2.120. The value of s, is 


| 0.25) + 7(0. 
$57 200125): d a (049) _ 9.596 


and, therefore, the 95% confidence interval for ш, — p2 is 
(3.1 — 2.7) — 2.120(0.596)Vi5 +$ € ш — uz 


< (3.1 = 2.7) + 2.120(0.596)у +1 


which reduces to 


020 < ш- pu; «100 А 


Observe that since the actual difference might be zero іп the preceding 
example, we cannot conclude that there is a real difference in the nicotine content 
of the two brands of cigarettes. More about that in Chapter 13. 


THEORETICAL EXERCISES 

1. If x is a value of a random variable having an exponential distribution, find 
k so that the interval from 0 to kx is a (1 — @)100% confidence interval for 
the parameter 6. 

2. If x, and x; are the values of a random sample of size 2 from a population 
having a uniform density with a = 0 and B — 6, find k so that 


0<0< k(x tx) 


is a (1 — a)100% confidence interval for 6. (Hint: Make use of the fact that 
x, + x; has a triangular density similar to that pictured in Figure 7.8.) 
Section 8.7, it can be shown that for a random 


3. By using the methods of iat for : 
i : population of Exercise 2 the distribution of the 


sample of size 2 from the 
sample range is given by 


2 

—(0— R) for0< R«60 
КК) = et 

0 elsewhere 
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Use this result to find c so that 
К<0< ск 


is a (1 — @)100% confidence interval for 6. 
4. Show that the (1 — @)100% confidence interval 


E g x [^4 
МЕ ЭШ aya rm 


is shorter than the corresponding interval given by 


1 c 2 c 
жт ал ENT Aus = 


5, If X is used as an estimate of и, show that we can be (1'— а)100% confident 


that |X — u|, the absolute value of our error, does not exceed a specified 


E gy 
amount e when the sample size is n — [ы ‘ 2) A 
e 


6. Show that s; is an unbiased estimator of с> and find its variance. 


7. Verify the result on page 371 which expresses t in terms of X,, X», and 8» 


APPLIED EXERCISES 


8. Measurements of the blood pressure of 25 elderly women have a mean of 


X = 140 mm of mercury. If these data can be looked upon as a random 
sample from a normal population with с = 10 mm of mercury, construct? 
95% confidence interval for the population mean и. 


. For several years, a mathematics placement test had been administered 0 


10. 
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all incoming freshmen at a certain college. If 64 students, randomly select 
over this period of time, took on the average 28.5 minutes to complete th 
test with a variance of 9.3 minutes’, construct а 99% confidence interval fo 
the true average time it takes a freshman to complete the test. 


An efficiency expert wants to determine the average time it takes a pit сй 
to change a set of four tires on a race car, Use the result of Exercise 5" 
determine the sample size required to be able to assert with 95% confident 
that the sample mean does not differ from the true mean by 2 seconds 


more. It is known from previous studies that the population standard deviation 
is 12 seconds. 


The length of the skulls of 10 fossil skeletons of an extinct species of birds 
has a mean of 5.68 cm and a standard deviation of 0.29 cm. Assuming tht) 
such measurements are normally distributed, find a 95% confidence interv? | 
for the mean length of the skulls of this species of birds. 
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12. A food inspector, examining 12 jars of a certain brand of peanut butter, 
obtained the following percentages of impurities: 2.3, 1.9, 2.1, 2.8, 2.3, 3.6, 
1.4, 1.8, 2.1, 3.2, 2.0, and 1.9. Assuming that such determinations are normally 
distributed, construct a 99% confidence interval for the average percentage 
of impurities in this brand of peanut butter. 


13. A random sample of size n, = 16 from a normal population with c, = 4.8 
has the mean X, — 18, and a random sample of size n; = 25 from a different 
normal population with 0; = 3.5 has the mean i, = 23. Find a 90% 
confidence interval for ш — #2- 


14. A study of two kinds of photocopying equipment shows that 60 failures of 
the first kind of equipment took on the average 80.7 minutes to repair with 
a standard deviation of 19.4 minutes, while 60 failures of the second kind 
of equipment took on the average 88.1 minutes to repair with a standard 
deviation of 18.8 minutes. Find a 99% confidence interval for the difference 
between the true average times it takes to repair failures of the two kinds of 
photocopying equipment. 

15. Twelve randomly selected mature citrus trees of one variety have a mean 
height of 13.8 feet with a standard deviation of 1.2 feet, and fifteen randomly 
selected mature citrus trees of another variety have a mean height of 12.9 
feet with a standard deviation of 1.5 feet. Assuming that the random samples 
were selected from normal populations with equal variances, construct a 
95% confidence interval for the difference between the true average heights 
of the two kinds of citrus trees. 

16. The following are the heat-producing capacities of coal from two mines (in 
millions of calories per ton): 


Mine A: 8,500, 8,330, 8,480, 7,960, 8,030 
Mine В: 7,710, 7,890, 7,920, 8,270, 7,860 


Assuming that the data constitute independent random samples from normal 
populations with equal variances, construct a 99% confidence interval for 
the difference between the true average heat-producing capacities of coal 


from the two mines. 
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ems in which we must estimate proportions, probabilities, 
percentages, ог rates, such as the proportion of defectives in a large shipment of 
transistors, the probability that a car stopped at a road block will have faulty 
lights, the percentage of school children with 1.Q.’s over 115, or the mortality 
rate of a disease. In many of these it is reasonable to assume that we are sampling 
a binomial population, and, hence, that our problem is to estimate the binomial 


There are many probl 
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parameter 0. Making use of the fact that for large n the binomial distribution 
can be approximated with a normal distribution, namely, that the random variable 


X — n6 


t7 Jno = 8) 


can be treated as if it had the standard normal distribution, we can write 


« 24/2 < шп I ) 1 
m чы, уч чы La E сс 
«/2 mal — 8j 1-53) a/2 
andobtaina(1 — a)100% confidence interval for Ө by solving the two inequalities 


X — n0 X — n0 


Wade and or 
Упа 9) "US Tracey < 2/2 


for the corresponding confidence limits. Leaving the details of this to the reader 


in Exercise 1 on page 382, let us instead give here a large-sample approximation 
by first writing the above probability as 


А Jaa -e ^ Je — 
P(6 = aa: 1 Aou el A EU 2 =l-a 
n 


~ xX ^" 
where 0 — F Then, substituting ô for 6 inside the radicals, which is a further 


approximation, we get’ 


THEOREM 11.5 (Large-sample confidence interval for 8) An approximate 
(1 — а)100% confidence interval for the binomial parameter 6 is given by 


А 6(1 — 6) 


~ 


êa — 8) 


un 
n 


‘In Exercise 3 on page 382 the reader will be asked to show how this result will 


have to be modified if we use the continuity correction when approximating the binomial 
distribution with a normal distribution. 
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EXAMPLE 11.5 


A study is being made to estimate the proportion of voters in a sizeable community 
who favor the construction of a nuclear power plant. If it is found that only 140 
of 400 voters selected at random favor the project, find a 95% confidence interval 
for the proportion of all voters in this community who favor the project. 


Solution 


Substituting ĝ = 19 = 0.35 and 20 = 1.96 into the above large-sample 
confidence interval for 6, we obtain 


03521954] 0931000) = оозе 11955 (0.35)(0.65) 
400 At 400 


which reduces to 


0.303 < 6 < 0.397 A 


Confidence intervals for the binomial parameter 0 can also be obtained 
from the special tables referred to on page 384. They are especially useful when 
n is small. 


11.5 CONFIDENCE INTERVALS FOR 
DIFFERENCES BETWEEN PROPORTIONS 


Problems frequently arise where it is desirable to estimate the difference between 
the binomial parameters 0; and 6; on the basis of independent random samples 
from two binomial populations. Such is the case, for example, if we wish to 
estimate the difference between the proportions of voters in two different districts 
that favor Candidate X for election to the United States Senate. 

If the respective numbers of successes are x, and x; and the corresponding 


A MENO P i L 
sample proportions are Ө, = a and 6; — т, let us investigate the sampling 


distribution of 6, — 6,, as a potential estimator of 0, — 6;. For large values of 


n, and np, the distributions of x, and x; can again be approximated with normal 
distributions having the means n,0, and n;6; and the variances n,4,(1 — 6,) and 
п,0.(1 — 65). Therefore, according to Exercise 6 on page 281, it follows that 


^ 


Ө, — Ө, has approximately a normal distribution with the mean 


Hô -ê 7 0; — 6; 
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and the variance 


2 2001-0), &- 6) 
and the random variable 
2200-0) - (6, - 8) 
9.01 = 6) d 6X1 an 62) 
n п; 


has approximately the standard normal distribution. Substituting this expression 
for z into 


P(-2a/2 < 2 < Zaj) = 1-а 


and proceeding as in Section 11.4, we arrive at the following result: 


—— ee 


THEOREM 116 (Large-sample confidence interval for Ө, — 6;) An approxi- 
mate (1 — 2)100% confidence interval for 6, — 6,, the difference between 
two binomial parameters, is given by 


А 


ro 77 Saree 
(6 — 6) – 2, · X DANS E 
n п; 
А (1-6 - 6 
(d + s 6,(1 &) 001-6) 
n п; 


^ ^ 
where @, = x,/n, and 0; = х,/п,. 


ЕХАМРІЕ 11.6 


If it is found that 132 of 200 voters in District A favor a given candidate for 


election to the United States Senate and 90 of 150 voters in District B favor this 
same candidate, find a 99% confidence interval for Ө, — 0,, the difference between 
the actual proportions of voters from the two districts who favor the candidate. 


t 1 
In Exercise 4 on page 382 the reader will be asked to fill in the details. 


Sec, 11.6.: Confidence Intervals for Variances 379 


Solution 


Substituting 6, = 3 = 0.66, б, = iso = 0.60, and Zoos = 2.575 into the 
large-sample confidence interval of Theorem 11.6, we obtain 


<M- 


(0.66 — 0.60) — 2.575 (0.66)(0.34) | (60040) 


200 


(0.66)(0.34) 2 (0.60)(0.40) 
150 


« (0.66 — 0.60) + 2.57 
( ) 5 200 


which reduces to 


—0.074 < 6, — 6; < 0.194 


Thus, we are 99% confident that the interval from —0.074 to 0.194 contains 
the difference between the actual proportions of voters from the two districts 
who favor the candidate. Observe that this includes the possibility of a zero 
difference between the two proportions. A 
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Given a random sample of size n from a normal population, we can obtain a 
(1 = а)100% confidence interval for o? by making use of Theorem 8.10, according 
to which 

(п = 0s? 


а? 


is а random variable having a chi-square distribution with n — 1 degrees of 
freedom. Thus, we have 


n-0s 
Ра. < a < os] =1-@ 


ог 


Е = ie о < а De] -1-a 


2 
Xa/2,n-1 Xi-a/2,n-1 
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where Xa/2,n-1 and ү1_„/› „_| are as defined on page 288, and we get 


THEOREM 117 (Confidence interval for с?) If s? is the value of the variance 
of a random sample of size n from a normal population, a (1 — a)100% 
confidence interval for o? is given by 


(п — 1)s? d (n = 1)s? 
X205 Xin 


Corresponding (1 — a)100% confidence limits for o can be obtained by taking 
the square roots of the confidence limits for o. 


EXAMPLE 11.7 


In 16test runs the gasoline consumption of an experimental engine had a standard 
deviation of 2.2 gallons. Construct a 99% confidence interval for o°, measuring 
the true variability of the gasoline consumption of this engine. 


Solution 


Assuming that the observed data can be looked upon as a random sample 
from a normal population, we substitute n — 16 and s — 22, along with 
Жо = 32.801 апа Xess = 4.601, obtained from Table V, into the 
confidence interval of Theorem 11.7, and we get 


15(2.2)? — , _ 15(2.2)2 
as < LL 
32800 ^7 < 469 


or 


PAPA SS ТО 


Taking square roots, we find that the corresponding 99% confidence interval for 
c is given by 


1.49 < с < 3.97 


Sec. 11.7.: Confidence Intervals for Ratios of Two Variances 381 


117 CONFIDENCE INTERVALS 
FOR RATIOS OF TWO VARIANCES 


2 2 : * . 
If sj and 52 are the sample variances of independent random samples of size n, 
and п, from normal populations, then according to Theorem 8.14 


is a random variable having an F distribution with m — 1 and n; – 1 degrees 
of freedom. Substituting this expression for F into 


РОН E US Forxwawu)-1-e 
where Fi-a/2,m-1,m-1 aNd F,;2,4,-1,,,-1 аге as defined on page 293, making use 


of the fact that F,-4/24, 1,5 = 1/Fa/2m-1,m-1 (see Exercise 17 on page 297), 
and proceeding as in Section 11.4, we arrive at the following result: 


сі 

о 

the variances of independent random samples of size п, and n; from two 
2 


THEOREM 11.8 ( confidence interval for ) If s? and s3 are the values of 


normal populations, а (1 — 2)100% confidence interval for x is given by 
2 


2 2 2 
51 1 Ti Si 
3 << 5° Етті 
52 aj2m-lm-1 | 02 2 


Corresponding (1 — a)100% confidence limits for = can be obtained by taking 
2 
2 


MET т 
the square roots of the confidence limits for E 
2 


EXAMPLE 11.8 
gi 
де 


With reference to Example 11.4, find a 98% confidence interval for E 
2 


t In Exercise 6 on page 383 the reader will be asked to fill in the details. 
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Solution 


From Example 11.4 we have n, = 10, m = 8, s, = 0.5, and s; = 0.7, and 
from Table VIb we find that Fo; 5; = 6.72 and Ев тә = 5.61. Thus, substitu- 
tion into the confidence interval of Theorem 11.8 yields 


or 


а? 
0076 < = < 2.862 А 
93 


Since the interval obtained іп the preceding example includes the possibility 
that the ratio is 1, there is no real evidence here against the assumption of equal 
population variances in Example 11.4. 


THEORETICAL EXERCISES 


1. By solving the inequalities on page 376, namely, 


E x — пб 


Xx — пб 
-z, Fifa ILA гет 
^^ 01-6) КҮШ Ла) < 2 
show that the (1 — @)100% confidence limits for 0 are 


1 х(п- х) 1 
х tt an| ——— + =: 22, 
2 /2 /2 a 4 /2 


пт); 


2. Use the large-sample confidence interval for 0 to show that we can be at 
least (1 — a)100?6 confident that |6 — 6|, the absolute value of the error 


which we make when we use the sample proportion 6 = X as an estimate 
n 


B З za 
of Ө, will not exceed a specified amount e when the sample size is n — A 
е 
3. Modify the large-sample confidence interval of Theorem 11.5 to account for 
the continuity correction which we use when we approximate a binomial 
distribution with a normal distribution. 


4. Fillin the details which lead from the z statistic on page 378 and the probability 
P(—2a/2 < 2 < 7,55) = 1 — a to the confidence interval of Theorem 11.6. 


5. 
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For large n, the sampling distribution of s is sometimes approximated with 
2 

a normal distribution having the mean с and the variance = (see Exercise 
n 


5 on page 295). Show that this approximation leads to the following large- 
sample (1 — a)100% confidence interval for с: 


s s 
«g« 
2, 2, 
1 + 52 _ 2а/2 
v2n v2n 


Fill in the details which lead from the F statistic on page 381 and the 


probability P(Fi-a/2,m-1,m-1 cR cE etu) br to the confidence 
interval of Theorem 11.8. 


APPLIED EXERCISES 


7. 


10. 


11. 


12. 


. Use the theory of Exerci: 


A sample survey at a supermarket showed that 204 of 300 shoppers regularly 
use cents-off coupons. Use the large-sample confidence interval of Theorem 
11.5 to find a 99% confidence interval for the corresponding true proportion. 


. In a random sample of 250 television viewers in a certain area, 190 had seen 


a certain controversial program. Construct a 95% confidence interval for the 


corresponding true proportion, using 

(a) the large-sample confidence interval of Theorem 11.5; 

(b) the confidence limits of Exercise 1. 

se 2 to find the minimum sample size which will 


enable us to assert with a degree of confidence of at least 95% that a sample 
he parameter 6 of a binomial popula- 


proportion (which is used to estimate t! 
tion) is “off” by at most 0.03. 

Use the theory of Exercise 2 to find the minimum sample size which will 
enable us to assert with a degree of confidence of at least 99% that a sample 
proportion (which is used to estimate the parameter 0 of a binomial popula- 
tion) is “off” by at most 0.02. 

In a random sample of visitors to à famous tourist attraction, 84 of 250 men 
and 156 of 250 women bought souvenirs. Construct a 95% confidence interval 
for the difference between the true proportions of men and women who buy 
souvenirs at this tourist attraction. 


iage license applications, chosen at random in 1971, there 
Amona NM © one year older than the men, and 


i ich the women were at least ' 
Mb chosen at random in 1977, there 


among 400 marriage license applications, 
were 68 in which the women were at least one year older than the men. 
Construct a 99% confidence interval for the diference between the corre- 


sponding true proportions 0 lications in which the 


f marriage license арр 
women were at least one year older than the men. 
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13. With reference to Exercise 11 on page 374, construct a 95% confidence 
interval for the true variance of the length of the skulls of the given species 
of birds. 


14. Use the data of Exercise 9 on page 374 and the large-sample confidence . 
interval of Exercise 5 to construct a 99% confidence interval for the true 
standard deviation of the time it takes students to complete the test. 


15. With reference to Exercise 15 on page 375, construct a 98% confidence 
interval for the ratio of the two population variances. 


16. With reference to exercise 16 on page 375, construct a 90% confidence interval 
for the ratio of the two population standard deviations. 
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12.1 


12 


Hypothesis Testing: 
Theory 


INTRODUCTION 


If an engineer has to decide on the basis of sample data whether the true average 
lifetime of a certain kind of tire is at least 22,000 miles, if an agronomist has to 
decide on the basis of experiments whether one kind of fertilizer produces a 
higher yield of soybeans than another, and if a manufacturer of pharmaceutical 
products has to decide on the basis of samples whether 90 percent of all patients 
given a new medication will recover from a certain disease, these problems can 
all be translated into the language of statistical tests of hypotheses. In the first 
case we might say that the engineer has to test the hypothesis that 6, the parameter 
of an exponential population, is at least 22,000; in the second case we might say 
that the agronomist has to decide whether ш > #2, Where pi and ш are the 
means of two normal populations; and in the third case we might say that the 
manufacturer has to decide whether 6, the parameter of a binomial population, 
equals 0.90. In each case it must be assumed, of course, that the chosen distribution 
correctly describes the experimental conditions, namely, that the distribution 
provides the correct statistical model. 

As in the above examples, most tests of statistical hypotheses concern the 
parameters of distributions, but sometimes they also concern the type, or nature, 


of the distributions, themselves. For instance, in the first of our three examples 


the engineer may also have to decide whether he is actually dealing with a sample 
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from an exponential population, or whether his data are values of random 
variables having, say, the Weibull distribution of Exercise 15 on page 218. 


DEFINITION 121 А statistical hypothesis is an assertion or conjecture about 
the distribution of one or more random variables. If a statistical hypothesis 
completely specifies the distribution, it is referred to as a simple hypothesis; 
if not, it is referred to as a compósite hypothesis. 


A simple hypothesis must therefore specify not only the functional form of 
the underlying distribution, but also the values of all parameters. Thus, in the 
third of the above examples, the one dealing with the effectiveness of the new 
medication, the hypothesis 0 = 0.90 is simple, assuming, of course, that we 
specify the sample size and that the population is binomial. However, in the first 
of the above examples the hypothesis is composite since @ > 22,000 does not 
assign a specific value to the parameter 6. 

To be able to construct suitable criteria for testing statistical hypotheses, it 
is necessary that we also formulate alternative hypotheses. For instance, in the 
example dealing with the lifetimes of the tires we might formulate the alternative 
hypothesis that the parameter 0 of the exponential population is less than 22,000; 
in the example dealing with the two kinds of fertilizer we might formulate the 
alternative hypothesis ш, = шз; and in the example dealing with the new medica- 
tion we might formulate the alternative hypothesis that the parameter 0 of the 
given binomial population is only 0.60, which is the disease's recovery rate without 
the new medication. 

The concept of simple and composite hypotheses applies also to alternative 
hypotheses, and in the first example we can now say that we are testing the 
composite hypothesis 0 > 22,000 against the composite alternative 0 < 22,000, 
where 6 is the parameter of an exponential population. Similarly, in the second 
example we are testing the composite hypothesis ш, > 4; against the composite 
alternative ш, = u2, where ш, and и are the means of two normal populations, 
and in the third example we are testing the simple-hypothesis 0 = 0.90 against 
the simple alternative 0 = 0.60, where 6 is the parameter of a binomial population 
for which n is given. 

Frequently, statisticians state as their hypotheses the opposite of what they 
believe to be true, with the hope that the test procedures will lead to their rejection. 
For instance, if we want to show that the students in one school have a higher 
average I.Q. than those of another, we might formulate the hypothesis that there 
is no difference, namely, that и, = u>. Similarly, if we want to show that one 
kind of ore has a higher percentage content of uranium than another kind of 
ore, we might formulate the hypothesis that the two percentages are the same; 
and if we want to show that there is a greater variability in the quality of one 
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product than there is in the quality of another, we might formulate the hypothesis 
that there is no difference, namely, that v, = c». In view of the assumptions of 
no difference, hypotheses like these led to the term null hypothesis, although 
nowadays this term is applied to any hypothesis we may wish to test. 

Symbolically, we shall use the symbol Но for the null hypothesis we want 
to test and Н, for the alternative hypothesis. Problems involving more than two 
hypotheses, that is, problems involving several alternative hypotheses, tend to be 
quite complicated, and will not be studied in this book. 


| 
122 TESTING A STATISTICAL HYPOTHESIS 


The testing of a statistical hypothesis is the application of an explicit set of rules 
for deciding whether to accept the null hypothesis or to reject it in favor of the 
alternative hypothesis. Suppose, for example, that a statistician wants to test the 
null hypothesis Ө = 9% against the alternative hypothesis 0 = 41. In order to 
make a choice, he will generate sample data by conducting an experiment and 
then compute the value of a test statistic 6, which will tell him fer each possible 
outcome of the sample space what action to take. The test procedure, therefore, 
partitions the possible values of the test statistic into two subsets: an acceptance 
region for Ho and a rejection region for Ho. 

The procedure just described can lead to two kinds of errors. For instance, 
if the true value of the parameter 0 is 6, and the statistician incorrectly concludes 
that 0 = 61, he is committing an error referred to as a type I error. On the other 
hand, if the true value of the parameter @ is 61 and the statistician incorrectly 
concludes that @ = 6, he is committing a second kind of error referred to as a 


type II error. . 


DEFINITION 12.2 


1. Rejection of the null hypothesis when it is true is called a type I 

error; the probability of committing a type I error is denoted by a. 
2. Acceptance of the null hypothesis when it is false is called a type П 
error; the probability of committing a type II error is denoted by £. 


It is customary to refer to the rejection region for Ho as the critical region 


of the test, and to the probability of obtaining a value of the test statistic inside 


the critical region when Ho is true as the size of the critical region. Thus, the size 
I error. This 


of a critical region is just the probability æ of committing a type 
probability is also called the level of significance of the test (see discussion on 


page 401). 
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EXAMPLE 12.1 


With reterence to the third illustration on page 385, suppose the manufacturer 
of the new medication wants to test the null hypothesis 0 = 0.90 against the 
alternative hypothesis 0 = 0.60. His test statistic is x, the observed number of 
successes in n = 20 trials, and he will accept the null hypothesis if x > 15; 
otherwise, he will conclude that 0 = 0.60. Evaluate the probabilities a and f. 


Solution 


The acceptance region for Н, is given by x = 15, 16, 17, 18, 19, and 20, 
and, correspondingly, the rejection or critical region is given by x = 0,1, 
2,...,14. Therefore, from Table I, 


a = P(typelerror) 
= Р(х < 15; 0 = 0.90) 
= 0.0114 


and 


B = P(type II error) 
= Р(х > 15; 0 = 0.60) 
= 0.1255 a 


A good test procedure is one in which both a and В are small, thereby 
giving us a good chance of making the correct decision. The probability of a type 
П error in Example 12.1 is rather high, but this can be reduced by appropriately 
changing the critical region. For instance, if we use the acceptance region x 2 16 
in Example 12.1, so that the critical region is x < 16, it can easily be checked 
that this would make a = 0.0433 and 8 = 0.0509. Thus, while the probability 
of a type II error has become smaller, the probability of a type I error has become 
larger. The only way in which we can reduce the probabilities of both types of 
errors is to increase the size of the sample, but so long as п is held fixed, this 
inverse relationship between the probabilities of type I and type II errors is 
typical of statistical decision procedures. In other words, if the probability of 
one type of error is reduced, that of the other type of error is increased. 
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EXAMPLE 12.2 


Suppose we want to test the null hypothesis that the mean of a normal population 
with o? = 1 is де against the alternative hypothesis that it is д, where ui > Ио. 
Find the value of К such that X > К provides a critical region of size а = 0.05 
for a random sample of size п. 


Solution 


Referring to Figure 12.1 and Table III, we find that 2 = 1.645 corresponds 
to an entry of 0.4500. Thus, 


Figure 12.1 Critical region for testing ш = шо against ш = ш. 


апа 


зыт 1.645 
= Ho m 


E 1.645 
so that the desired critical region of size a = 0.05 is X > uo + UN. 


EXAMPLE 12.3 


With reference to the preceding example, determine the minimum sample size 
needed to test the null hypothesis ро = 10 against the alteznative hypothesis 
p, = 11 with В < 0.05. 
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Solution 


The probability B of a type II error is given by the area of the ruled region 
of Figure 12.1, so we get 


Т Toi 
p= P(x<10+ s a =n) 
(10 + 1589) si 
Асема NN 
1/Yn 


Р(2 < -Vn + 1.645) 
Now, since P(z < –200) = 0.05, we set 
= Vn + 1.645 = —zyo = —1.645 


from which we obtain п = 10.8. Therefore, the minimum sample size needed 
to keep B « 0.05 in this example is n — 11. A 


123 LOSSES AND RISKS 


The concepts of loss functions and risk functions that were introduced in Chapter 
9 also play an important part in the theory of hypothesis testing. In the decision 
theory approach to testing the null hypothesis that a population parameter 0 
equals 6, against the alternative that it equals 6,, the statistician either takes the 
action ay and accepts the null hypothesis, or he takes the action a, and accepts 
the alternative hypothesis. Depending on the true "state of Nature" and the 
action which he takes, his losses are shown in the following table: 


Statistician 
do а, 
6 L(ao, 8o) | L(a,, 8o) | 
Nature 
6, | L(a, Ө) | L(a,, 6,) 


These losses can be positive or negative (reflecting penalties or rewards), and 
the only condition which we shall impose is that 
L(ao, 60) < L(a,, 6) and L(a, 6,) < L(ao, 6,) 


namely, that in either case the right decision is more profitable than the wrong опе. 
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As in the statistical games of Section 9.3, the statistician’s choice will depend 
on the outcome of an experiment and the decision function d, which tells him 
for each possible outcome what action to take. If the null hypothesis is true and 
the statistician accepts the alternative hypothesis, namely, if the value of the 
parameter is б and the statistician takes action ау, he commits a type I error; 
correspondingly, if the value of the parameter is б, and the statistician takes 
action ay, he commits a type II error. For the decision function d, we shall let 
a (d) denote the probability of committing a type Terror and (d) the probability 
of committing a type II error. The values of the risk function (defined on page 
322) are thus 


R(d, 0) = [1 — a(d)]L(ao, 8) + а(4) а, 00): 
= Цао, 9) + «(4)[1(а,, bo) = Llao, 0)] 
апа 
R(d, 6) = B(d)L(a, Ө) + [1 — В(4)1 а, 0) 


L(a;, 61) + B(d)LL(ao, 1) — Ша, 8] 


where, by assumption, the quantities in brackets are both positive. It is apparent 
from this (and should, perhaps, have been obvious from the beginning) that to 
minimize the risks the statistician must choose a decision function which, in some 
way, keeps the probabilities of both types of errors as small as possible. 

If we could assign prior probabilities to 60 and 6, and if we knew the exact 
values of all the losses L(aj, 6,) in the table on page 390, we could calculate the 
Bayes risk (defined on page 324) and look for the decision function which 
minimizes this risk. Alternatively, if we looked upon Nature as à malevolent 
opponent we could use the minimax criterion and choose the decision function 
which minimizes the maximum risk, but as must have been apparent from the 
applied exercises on page 316, this is not a very realistic approach in most practical 


situations. 


124 THE NEYMAN-PEARSON LEMMA 


In the theory of hypothesis testing which is nowadays referred to as “classical” 
or “traditional,” namely, the Neyman-Pearson theory, We circumvent the depen- 
dence between probabilities of type I and type II errors by limiting ourselves to 
test statistics for which the probability of a type I error is less than or equal to 
some constant a. In other words, we restrict ourselves to critical regions of size 
less than or equal to a. We must állow for the critical region to be of size less 
than a to take care of discrete random variables, where it may be impossible to 
find a test statistic for which the size of the critical region is exactly equal to a. 
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For all practical purposes, then, we hold the probability of a type I error fixed, 
and look for the test statistic which minimizes the probability of a type II error, 
or equivalently, which maximizes the quantity 1 — 8. When testing the null 
hypothesis 0 = 6; against the alternative hypothesis 6 = 0,, the quantity 1 — д 
is referred to as the power of the test at Ө = 6,. 

A critical region for testing a simple null hypothesis 6 = 6, against a simple 
alternative hypothesis 0 = 6, is said to be best or most powerful, if the power of 
the test at @ = 0, is a maximum. To construct a most powerful critical region in 
this kind of situation, we refer to the likelihoods (see page 351) of a random 
sample of size n from the population under consideration when 0 — 6, and 

= 6,. Denoting these likelihoods by Lọ and L,, we thus have 


Lo = [D fu; &) and Ly = TI f(x; 6) 
i=l іт 1 


22 PA L 
Intuitively speaking, it stands to reason that e should be small for sample 
points inside the critical region, which lead to type I errors when 0 = 6, and to 
ot ч a A LE 
correct decisions when @ = 6,; similarly, it stands to reason that T should be 
1 


large for sample points outside the critical region, which lead to correct decisions 


when 6 = 6, and type II errors when 0 = 6,. The fact that this argument does, 


indeed, guarantee a most powerful critical region is proved by the following 
theorem: 


SSF 


THEOREM 121 (Neyman- Pearson Lemma) If C is a critical region of size 
a and k is a constant such that 


Lo ann 
=s 
T k inside C 
and 
L 
T =k  outsideC 
1 


then C is a most powerful critical region of size a for testing Ө = 6, against 


ÈR 


6 = ө. 
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Proof. Suppose that C is a critical region satisfying the conditions 
of the theorem and that D is some other critical region of size o. Thus, 


[je fent 


where dx stands for dx, х)... dx,, and the two multiple integrals are taken 
over the respective n-dimensional regions C and D. Now, making use of 
the fact that C is the union of the disjoint sets C ^ D and C ^ D' while 
D is the union of the disjoint sets C о D and C' ^ D, we can write 


[fna [enn [7 fen [fnt 


CoD CoD' C'c D 


and, hence, 


oem dnm 


CoD’ C'nD 


Then, since L, > Lo/k inside C and L, € Lo/k outside C, it follows that 


fi [nas [ fen [Feo [Joe 
CoD CoD' Cc'nD C'oD 


and, hence, that 


Finally, 
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so that 


and this completes the proof of Theorem 12.1. The final inequality states 
that for the critical region C the probability of not commitiing a type II 
error is greater than or equal to the corresponding probability for any other 
critical region of size a. (For the discrete case the proof is the same, with 
summations taking the place of integrals.) v 


EXAMPLE 12.4 


A random sample of size n from a normal population with a^ = 1 is to be used 
to test the null hypothesis и = де against the alternative hypothesis и = щш, 
where ш > ро. Use the Neyman-Pearson lemma to find the most powerful 
critical region of size a. 


Solution 


The two likelihoods are 


I CLA: Ж : 
е Же} даа Жегу 1 У 
; los) Si аай ан ES А 


where the summations extend from i = 1 to i = n, and after some sim- 
plifications their ratio becomes 


L ^ 


Z9 = eriu Huu) X x, 


Lı 


Thus, we must find a constant k and a region C of the sample space such 
that 


Mp?-u? Р ЗХ 
el Pimko Us EN ek ^ inside С 


Ki 2.42 E "xy $ 
ezi Ho tomm) Ex > k outside C 


and after taking logarithms, subtracting gui — 25), and dividing by the 
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negative quantity n(j — д), these two inequalities become 


х2 К  insideC 


х= к outside C 


where К is an expression іп k, n, po, and д. 

Ла actual practice, constants like К are determined by making use of 
the size of the critical region and appropriate statistical theory. In our case 
(see Example 12.2) we obtain К = po + Za" ==, where z, is as defined 

n 
on page 365. Thus, the most powerful critical region of size а for testing 
the null hypothesis ш = po against the alternative ш = и (with д > Ho) 
for the given normal population is 


1 
X2 pot mF 
te, vn 


and it should be noted that it does not depend on дү. This is an important 
property, to which we shall refer again in Section 12.5. A 


Note that in Example 12.4 we derived the critical region without first 
mentioning that the test statistic is to be X. Since the specification of a critical 
region thus defines a corresponding test statistic and vice versa, the two terms 
are used interchangeably in the language of statistics. 


THEORETICAL EXERCISES 
se whether the given hypothesis is simple or composite: 


1. Decide in each ca 
s a gamma distribution with 


(a) the hypothesis that à random variable ha 
a = Запі $ = 2; 

(b) the hypothesis that a random 
a = 3 and B # 2; 

(c) the hypothesis that ar 

(d) the hypothesis that a random variable 
mean ш = 0.50; 

(e) the hypothesis that a random 


variable has a gamma distribution with 


andom variable has an exponential density; 
has a beta distribution with the 


variable has a Poisson distribution with 


ATEM | Pas nii 4 
(f) the hypothesis that а random variable has a Poisson distribution with 
А > 125; spect Куз ; 
(g) the hypothesis that a random variable has a normal distribution with 
the mean и = j Ў f РСЕ 
K R dom variable has a negative binomial distribu- 


(h) the hypothesis that a ran 
tion with k = 3 and 6 < 0.60. 
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2. 


10. 


A bowl contains seven marbles of which Ө are red while the others are blue. 
In order to test the null hypothesis 0 = 2 against the alternative 0 = 4, two 
of the marbles are randomly drawn without replacement and the null 
hypothesis is rejected if and only if both are red. Find the probabilities of 
committing type I and type II errors with this criterion. 


. With reference to Example 12.1 on page 388, what would have been the 


probabilities of type I and type II errors if the acceptance region had been 
x > 17 and the corresponding rejection region had been x < 17? 


. Let x, and x; constitute a random sample of size 2 from a normal population 


with a? = 1. If the null hypothesis и = де is to be rejected in favor of the 
alternative hypothesis ш = ду, where шу > uo, when X > po + 1, what is 
the size of this critical region? 


. A single observation of a random variable having an exponential distribution 


is used to test the null hypothesis that the mean of the distribution is 0 — 2 
against the alternative that it is 0 = 5. If the null hypothesis is accepted if 
and only if the observed value of the random variable is less than 3, find the 
probabilities of type I and type II errors. 


Show that if ш, < ио in Example 12.4, the Neyman-Pearson lemma yields 


IN Я 1 
the critical region X € uo — 2, · Uri 
n 


Let x, and x; constitute a random sample of size 2 from the population given 
by 


ext for0<x<1 
х; 6) = 
s ) l elsewhere 
If the critical region x,x, > 1 is used to test the null hypothesis 0 = 1 against 
the alternative hypothesis 0 = 2, what is the power of this test at ө = 2? 


А random sample of size n from an exponential population is used to test 
the null hypothesis that its parameter is 6, against the alternative that its 
parameter is 0;, where 6, > б. Use the Neyman- Pearson lemma to find the 
most powerful critical region of size o, and use the result of Example 7.16 
on page 267 to indicate how to evaluate the constant. 


. Use the Neyman-Pearson lemma to indicate how to construct the most 


powerful critical region of size a to test the null hypothesis that 6, the 
parameter of a binomial distribution with a given value of n, equals 6, against 
the alternative that it equals 0, < 6,. 


If n = 100, 6 = 0.40, Ө, = 0.30, and a is as large as possible without 
exceeding 0.05, use the normal approximation to the binomial distribution 
to find the probability of committing a type II error with the criterion 
constructed in Exercise 9. 
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11. A single observation of a random variable having a geometric distribution is 
to be used to test the null hypothesis that its parameter equals 6, against the 
alternative that it equals 0, > 4. Use the Neyman-Pearson lemma to find 
the best critical region of size a. 


12. Given a random sample of size n from a normal population with д = 0, use 
the Neyman-Pearson lemma to construct the most powerful critical region 
of size о to test the null hypothesis 7 = Go against the alternative с = 01 A 
©. 


APPLIED EXERCISES 


13. An airline wants to test the null hypothesis that 60 percent of its passengers 
object to smoking inside the plane. Explain under what conditions they would 
be committing a type I error and under what conditions they would be 
committing a type II error. 

14. A doctor is asked to give an executive a thorough physical checkup to test 
the null hypothesis that he will be able to take on additional responsibilities. 
Explain under what conditions the doctor would be committing a type I 
error and under what conditions he would be committing a type II error. 


15. Suppose that in Example 12.1 on page 388 the manufacturer of the new 
medication feels that the odds are 4 to 1 that with this medication the recovery 
rate from the disease is 0.90 rather than 0.60. With these odds, what are the 
probabilities that he will make a wrong decision if he uses the decision 


function 


forx > 15 
forx < 15; 


(а) 4(х)= { 


а 
а 


_ [ао forx 2 16 
к= n for x < 16; 


o forx > 14 
forx < 14. 


() d(x) = | 


125 THE POWER FUNCTION OF A TEST 


In Example 12.1 we were able to give unique values for the probabilities of 
committing type I and type II errors because we were testing a simple hypothesis 
against a simple alternative. In actual practice, it is relatively rare, however, that 
simple hypotheses are tested against simple alternatives; usually one or the other, 
or both, are composite. For instance, in Example 12.1 it might well have been 
more realistic to test the null hypothesis that the recovery rate from the disease 
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is 0 2 0.90 against the alternative that Ө < 0.90, namely, the alternative that the 
new medication is not as effective as claimed. 

When we deal with composite hypotheses, the problem of evaluating the 
merits of a test criterion, or critical region, becomes much more difficult. In that 
case we have to consider the probabilities a(@) of committing a type I error for 
all values of @ within the domain specified under the null hypothesis Ho, and 
the probabilities 8(0) of committing а type II error for all values of Ө within the 
domain specified under the alternative hypothesis H,. It is customary to combine 
the two sets of probabilities in the following way: 


DEFINITION 12.3 The power function of a test of a statistical hypothesis Ho 
against an alternative hypothesis Н, is given by 


a(@) for values of 0 assumed under Hy 
1— (8)  forvalues of 0 assumed under Н, 


7(0) = { 


Thus, the values of the power function are the probabilities of rejecting the null 
hypothesis Ho for various values of the parameter 0. Observe also that for values 
of 0 assumed under Ho, the power function gives the probability of committing 
a type I error, and for values of 0 assumed under Hi, it gives the probability of 
not committing a type II error. 


EXAMPLE 12.5 


With reference to Example 12.1, suppose that we had wanted to test the null 
hypothesis 


Hy: 02 0.90 
against the alternative hypothesis 
Ну: 0 < 0.90 


Investigate the power function for the test criterion according to which we reject 
Hy when x < 15, and otherwise we accept it. 


Solution 


Choosing select values of 6, we find from Table I the probabilities a(@) of 
getting fewer than 15 successes far Ө = 0.90 and 0.95, and the probabilities 
B(8) of getting 15 or more successes for Ө = 0.85, 0.80, . . . , and 0.50. These 
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probabilities and the corresponding values of the power function are shown 
in the following table: 


Probability of Probability of Probability of 


type I error type II error rejecting Но 

Д a(8) B(0) т(@) 
0.95 0.0003 0.0003 
0.90 0.0114 0.0114 
0.85 0.9326 0.0674 
0.80 0.8042 0.1958 
0.75 0.6171 0.3829“ 
0.70 0.4163 0.5837 
0.65 0.2455 0.7545 
0.60 0.1255 0.8745 
0.55 0.0553 0.9447 
0.50 0.0207 0.9793 


The graph of this power function is shown in Figure 12.2. Of course, it 
applies only to the critical region x < 15 of Example 12.1, but it is of 
interest to note that the power function of an ideal test criterion for this 
problem would be given by the dashed lines of Figure 12.2. A 


0.1 02 03 04 0.5 06 07 0.8 09 1.0 


Figure 12.2 Power function of Example 12.5. 
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Power functions play a very important role in the evaluation of statistical 
tests, particularly in the comparison of several critical regions which might all 
be used to test a given null hypothesis against a given alternative. Incidentally, 
if we had plotted in Figure 12.2 the probabilities of accepting Ho (instead of 
those of rejecting Ho), we would have obtained the operating characteristic curve, 
or simply the OC-curve, of the given critical region. In other words, the values 
of the operating characteristic function, used mainly in industrial applications, 
are given by 1 — m(0). 

On page 391 we indicated that in the Neyman-Pearson theory of testing 
hypotheses we hold a, the probability of a type I error, fixed, and this requires 
that the null hypothesis Ho be a simple hypothesis, say, 0 = ĝo. As a result, the 
power function of any test of this null hypothesis will pass through the point 
(00, a), the only point at which the value of a power function is the probability 
of making an error. This facilitates the comparison of the power functions of 
several critical regions, which are all designed to test the simple null hypothesis 
0 = б, against a composite alternative, say, thé alternative hypothesis 0 # 6j. 
To illustrate, consider Figure 12.3, giving the power functions of three different 
critical regions, or test criteria, designed for this purpose. Since for each value 
of 0 except 6, the values of power functions are probabilities of making correct 
decisions, it is desirable to have them as close to 1 as possible. Thus, it can be 
seen by inspection that the critical region whose power function is given by the 
dotted curve of Figure 12.3 is preferable to the critical region whose power 
function is given by the curve which is dashed. The probability of not committing 
a type II error with the first of these critical regions always exceeds that of the 
second, and we say that the first critical region is uniformly more powerful than 
the second; also, the other critical region is said to be inadmissible. 

The same clear-cut distinction is not possible if we attempt to compare the 
critical regions whose power functions are given by the dotted and solid curves 
of Figure 12.3—in this case the first one is preferable for  — 6, while the other 
is preferable for Ө > 6,. In situations like this we need further criteria for 
comparing power functions, for instance that of Exercise 16 on page 411. Note 
that if the alternative hypothesis had been Ө > 6,, the critical region whose 
power function is given by the solid curve would have been uniformly more 
powerful than the critical region whose power function is given by the dotted 
curve. 

In general, when testing a simple hypothesis against a composite alternative, 
we specify a, the probability of committing a type I error, and refer to one critical 
region of size а as uniformly more powerful than another if the values of its 
power function are always greater than or equal to those of the other, with the 
strict inequality holding for at least one value of the parameter under consider- 
ation. If, for a given problem, a critical region of size а is uniformly more 
powerful than any other critical region of size a, it is said to be uniformly most 
powerful; unfortunately, uniformly most powerful critical regions rarely exist 
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Figure 12,3 Power functions. 


when we test a simple hypothesis against à composite alternative. Of course, 
when we test a simple hypothesis against à simple alternative, a most powerful 
critical region of size a, as defined on page 392, is, in fact, uniformly most 
powerful. 

Until now we have always assumed that the acceptance of Ho is equivalent 
to the rejection of Hi, and vice versa, but this is not the case, for example, in 
multi-stage or sequential tests, where the alternatives are to accept Hp, to accept 
H,, or to defer the decision until more data have been obtained. It is also not 
the case in so-called tests of significance, where the alternative to rejecting Hy is 
reserving judgment instead of accepting Ho. For instance, if we want to test the 
null hypothesis that a coin is perfectly balanced against the alternative that this 
is not the case, and 100 tosses yield 57 heads and 43 tails, this will not enable 
us to reject the null hypothesis when a = 0,05 (see Exercise 6 on page 409). 
However, since we obtained quite a few more heads than the 50 which we can 
expect for a balanced coin, we may well be reluctant to accept the null hypothesis 
as true. To avoid this, we can say that the difference between 50 and 57, the 
number of heads which we expected and the number of heads which we obtained, 
may reasonably be attributed to chance—or we can say that this difference is not 
large enought to reject the null hypothesis. In either case, we do not really commit 
ourselves one way or the other, and so long as we do not actually accept the null 
hypothesis, we cannot commit a type II error. It is mainly in connection with 
tests of this kind that we refer to the probability of a type I error as the level of 


significance. 
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126 LIKELIHOOD RATIO TESTS 


The Neyman-Pearson lemma provides a means of constructing most powerful 
critical regions for testing a simple null hypothesis against a simple alternative 
hypothesis, but it does not always apply to composite hypotheses. We shall now 
present a general method for constructing critical regions for tests of composite 
hypotheses which in most cases have very satisfactory properties. The resulting 
tests, called likelihood ratio tests, are based on a generalization of the method of 
Section 12.4, but they are not necessarily uniformly most powerful. We shall 
discuss this method here with reference to tests concerning one parameter 6 and 
continuous populations, but all of our arguments can easily be extended to the 
multiparameter case and to discrete populations. 

To illustrate the likelihood ratio technique, let us suppose that Xi, X», ..«, 
and x, constitute a random sample of size n from a population whose density 
at x is f(x; 0), and that Q is the set of values which can be taken on by the 
parameter Ө. We often refer to О as the parameter space for 0. The null hypothesis 
we shall want to test is 


Hy 0€0 


and the alternative hypothesis is 
Н: 0€ 


where w is a subset of © and w’ is the complement of w with respect to О. Thus, 
the parameter space for 6 is partitioned into the disjoint sets w апа w'; according 
to the null hypothesis 0 is an element of the first set, and according to the 
alternative hypothesis it is an element of the second set. In most problems Q is 
either the set of all real numbers, the set of all positive real numbers, some interval 
of real numbers, or a discrete set of real numbers. 

When H, and Н, are both simple hypotheses, w and w' each have only one 
element, and in Section 12.4 we constructed tests by comparing the likelihoods 
Lo and L,. In the general case, where at least one of the two hypotheses 1$ 
composite, we compare instead the two quantities max Ly and max L, where 
max L, is the maximum value of the likelihood function (see page 351) for all 
values of 6 in w, and max L is the maximum value of the likelihood function for 
all values of 6 in О. In other words, if we have a random sample of size n from 
à population whose density at x is f(x; 6), 6 is the maximum likelihood estimate 
of 6 subject to the restriction that @ must Ье an element of c and 6 is the 
maximum likelihood estimate of 6 for all values of @ in О, then ^ 


max Ly = TI f(x; б) 


ha 
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and 
max L = f] f(x; Ê) 


These quantities are both values of random variables since they depend on the 
observed values хү, X», .:., Xn, and their ratio 


_ max Lo 
max L 


is referred to as a value of the likelihood ratio statistic А. 

Since max Ly апа max L are both values of a likelihood function, and 
therefore never negative, it follows that A > 0; also, since w is a subset of the 
parameter space 0, it follows that A « 1. When the null hypothesis is false, we' 
would expect max L, to be small compared to max L, in which case А would be 
close to zero. On the other hand, when the null hypothesis is true and 0 € о, we 
would expect max L, to be close to max L, in which case А would be close to 1. 
A likelihood ratio test states, therefore, that the null hypothesis Но is rejected if 
and only if A falls in a critical region of the form А = К, where 0. « К < 1. To 
summarize, / 


DEFINITION 124 If œ and о’ are complementary subsets of the parameter 
space ©, and if 


max Lo 


т max L 


where max Ly and max Г are the maximum values of the likelihood function 
for all values of 0 in w and Q, respectively, then the critical region 


A<k 


where 0 < k < 1, defines a likelihood ratio test of the null hypothesis 0 € о 
against the alternative hypothesis 0 € o' 


If Н, is a simple hypothesis, k is chosen so that the size of the critical 
region equals a; if Ho is composite, k is chosen so that the probability of a type 
I error is less than or equal to a for all 0 in w, and equal to о, if possible, for 
at least one value of 0 in о. Thus, if Ho is a simple hypothesis and g(A) is the 
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density of A at A when Ho is true, then К must be such that 
k 


Pasw f g(A) dA = a 
0 


In the discrete case, the integral is replaced by a sum and К is taken to be the 
largest value for which the sum is less than or equal to a. 


EXAMPLE 12.6 


Find the critical region of the likelihood ratio test for testing the null hypothesis 
Ho: 7 Ho 

against the composite alternative 
Н: * ш 


оп the basis of a random sample of size n from a normal population with the 
known variance c". 


Solution 
Since o contains only uo, it follows that Ê = jo, and since Q is the set of 


all real numbers, it follows by the method of Section 10.7 that Ё = ж. Thus, 
1 
ү е ( 1 y TR ica 
max Lo = (= e 
and 


1 
Tja a 


1 n 
max L = (45) е 


where the summations extend from i = 1 to i = n, and the value of the 
likelihood ratio statistic becomes 

1 
eae = mo? 


1 
-— -I(x - #2 
202 7 
е 


t 2 
Tad = Hol 


size of the critical region equal to a, because W! 
distribution of x, and did not have 
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after suitable simplifications which the reader will be asked to verify in 
Exercise 7 on page 409. Hence, the critical region of the likelihood ratio 
test is | 


п 

-IAG-a 
pri 

i = К 


and, after taking logarithms and dividing by d it becomes 
2 


2 2 
(8 = uy = nk 


or 
|= pol К 


where К will have to be determined so that the size of the critical region 
is а. Note that In К is negative in view of the fact that 0 < К < 1. 
Since x has a normal distribution with the mean po and the variance 


2 
= (see Theorem 8.4 on page 271), we find that the critical region of this 


likelihood ratio test is 
M g 
| = Mol 2 2a/2* UM 


or, equivalently, 


|z| > 2/2 


where 


X — bo 


= ууп. 


ust be rejected when 2 takes on a 


In other words, the null hypothesis m 
or a value less than or equal to 


value greater than or equal to 24/2; 
74/1. A 


find the constant that made the 
e were able to refer to the known 
to derive the distribution of the likelihood 


In the preceding example it was easy to 
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ratio statistic А, itself. Since the distribution of X is generally quite complicated, 


which makes it difficult to evaluate К, it is often preferable to use the following 
approximation, whose proof is referred to on page 411. 


THEOREM 122 For large n, the distribution of —2 · In А approaches, under 


very general conditions, the chi-square distribution with 1 degree of freedom. 


We should add that this theorem applies only to the one-parameter case; if the 
population involves more than one unknown parameter, upon which the null 
hypothesis imposes r restrictions, the number of degrees of freedom in the 
chi-square approximation of the distribution of —2 · In À is equal to r. Thus, if 
we want to test the null hypothesis that the unknown mean and variance of a 
normal population are, respectively, и = uo and a? = с? against the alternative 
hypothesis that u # uo and о? = сд, the number of degrees of freedom in the 
chi-square approximation of the distribution of —2- In А would be 2; the two 
restrictions are и = yo and o? = oj. 

Since small values of A correspond to large values of —2 - In A, we can use 
Theorem 12.2 to write the critical region of this approximate likelihood ratio test 
as 


—2.1пА > xi, 
where y2, is as defined on page 288. In connection with Example 12.6 we find that 


n X = Ho 


QUT MU ( 7 31 


which actually is a value of a random variable having the chi-square distribution 
with 1 degree of freedom. 

As we indicated on page 402, the likelihood ratio technique will generally 
produce satisfactory results. That this is not always the case is illustrated by the 
following example, which is somewhat out of the ordinary: 


-2 -inà = 


ЕХАМРІЕ 12.7 


On the basis of a single observation, we want to test the simple null hypothesis 
that the probability distribution of x is 


FE 
af a 
Ble N 
{ә 
ы > 
om | o 
Ls е 
am | a 
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against the composite alternative that the probability distribution is 


x 1 2 3 4 J 6 7 
a b 2 
g(x) 3 3 sud 3 0 0 0 


where a + b + с = 1. Show that the critical region obtained by means of the 
likelihood ratio technique is inadmissible. 


Solution 


The composite alternative hypothesis includes all the probability distribu- 
tions which we get by assigning different values from 0 to 1 to a, b, and c, 
subject only to the restriction that a + b + c = 1. To determine A for each 
value of x, we first let x = 1. For this value we get max Lo — i, max L=} 
(corresponding to a = 1), and, hence, А = 1. Determining À for the other 
values of x in the same way, we get the results shown in the following table: 


If the size of the critical region is to be @ = 0.25, we find that the likelihood 
ratio technique yields the critical region for which the null hypothesis is 
rejected when А = 1, namely, when x = 1,x = 2,0rx = 3; clearly, /(1) + 
/0) + 73) =з + 4 + 4 = 0.25. The corresponding probability of a type 
II error is given by g(4) + g(5) * g(6) * g(7), and hence, it equals Hi 

Now let us consider the critical region for which the null hypothesis 
is rejected only when x = 4. Its size is also a = 0.25 since f(4) = 5, but 
the corresponding probability of a type II error is 


b 
оаа Haley pae per 0 0 +0 


and since this is less than 3, the critical region obtained by means of the 


likelihood ratio technique is inadmissible. A 
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THEORETICAL EXERCISES 


1. A bowl contains 7 marbles of which @ are red while the others are blue. In 


order to test the null hypothesis 0 « 2 against the alternative 0 > 2, two of 

the marbles are randomly drawn without replacement and the null hypothesis 

is rejected if and only if both are red. 

(a) Find the probabilities of committing type I errors when 0 = 0, 1, and 2. 

(b) Find the probabilities of committing type II errors when 0 = 3, 4, 5, 
6, and 7. 

Also plot the graph of the power function. 


. Suppose that in Example 12.5 on page 398 we accept the null hypothesis 


6 > 0.90 if x > 16 and reject it in favor of the alternative hypothesis 0 < 0.90 
if x < 16. Construct the power function of this test criterion by calculating 
(0) for the same values of 0 as in the table on page 399. 


. A single observation is to be used to test the null hypothesis that the parameter 


of an exponential distribution equals 10 against the alternative hypothesis 
that it does not equal 10. If the null hypothesis is to be rejected if and only 
if the observed value is less than 8 or greater than 12, find 

(a) the probability of a type I error; 

(b) the probabilities of type II errors when 6 = 2, 4, 8, 16, and 20. 

Also plot the graph of the power function. 


‚ А random sample of size 64 is to be used to test the null hypothesis that the 


mean of a normal population with the variance a? = 256 is less than or equal 

to 40 against the alternative hypothesis that it is greater than 40. If the null 

hypothesis is to be rejected if and only if the mean of the random sample 

exceeds 43, find 

(a) the probabilities of type I errors when д = 37, 38, 39, and 40; 

(b) the probabilities of type II errors when ц = 41, 42, 43, 44, 45, 46, 47, 
and 48. 


Also plot the graph of the power function. 


. The sum of the values obtained in a random sample of size 5 from a Poisson 


population is to be used to test the null hypothesis that the mean of the 

population is greater than 2 against the alternative hypothesis that it is less 

than or equal to 2. If the null hypothesis is to be rejected if and only if the 

sum of the observations is 5 or less, find 

(a) the probabilities of type I errors when the mean of the population is 
2.2, 2.4, 2.6, 2.8, and 3.0; . 

(b) the probabilities of type II errors when the mean of the population is 
2.0, 1.5, 1.0, and 0.5. 


Also plot the graph of the power function. ( Hint: Use the result obtained in 
Example 7.15 on page 267.) 


10. 
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Verify the statement on page 401 that 57 heads and 43 tails in 100 flips of a 
coin does not enable us to reject the null hypothesis that the coin is perfectly 
balanced (against the alternative that it is not perfectly balanced) at the level 
of significance a = 0.05. (Hint: Use the normal approximation to the 
binomial distribution.) 


. Verify the final step on page 405 which led to 


ht 2 
та = Ho)’ 


A =e 


. The number of successes in n trials is to be used to test the null hypothesis 


that the parameter 0 of a binomial population equals } against the alternative 
that it does not equal 3. 


(a) Find an expression for the likelihood ratio statistic. 
(b) Use the result of part (a) to show that the critical region of the likelihood 
ratio test can be written as 


x:Inx +(n—x)+In(n-x) 2 К 


where x is the observed number of successes. 


(c) Studying the graph of f(x) 2 x: Inx + (п х) In(n = x), its 
minimum, and its symmetry, show that the critical region of this likeli- 


Й п 1 
hood ratio test can also be written as |x = zi > с, where c is a constant 


which depends on the size of the critical region. 


. A random sample of size n is to be used to test the null hypothesis that the 


parameter 0 of an exponential population equals 60 against the alternative 
that it does not equal 00. 


(a) Find an expression for the likelihood ratio statistic. 
(b) Use the result of part (a) to show that the critical region of the likelihood 


ratio test can be written as 


Re^ « К 


A random sample of size n from a normal population with unknown mean 
and variance is to be used to test the null hypothesis ш = Ho against the 
alternative ш # шо. Using the simultaneous maximum likelihood estimates 
of p and с? obtained in Example 10.13, show that the values of the likelihood 
ratio statistic can be written in the form 


р -n/2 
- +— 
(1+) 
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X = 
where t = ii 


s/ vn 


the г distribution of Section 8.5. 


. Note that the likelihood ratio test can, thus, be based on 


11. For the likelihood ratio statistic of the preceding exercise, show that –2 · In A 


approaches t^ as п > оо. [ Hint: Use the infinite series expansion of In(1 + x) 
given on page 228.] 


12. Given a random sample of size n from a normal population with unknown 


mean and variance, find an expression for the likelihood ratio statistic for 
testing the null hypothesis с = со against the alternative hypothesis с # оо. 
(Hint: See Example 10.13 on page 253.) 


13. Independent random samples of size m, n;,..., and ng from k normal 


populations with unknown means and variances are to be used to test the 
null hypothesis oj = 05 = ··· = oj against the alternative that these vari- 
ances are not all equal. 


(a) Show that under the null hypothesis the maximum likelihood estimates 
of the means д, and the variances a? are 


^ A k (n = 1)s? 
ui x айа сор= у шз 
i=l n 


k 
where n —' Y, nj, while without restrictions the maximum likelihood 


i=l 
estimates of the means и, and the variances a? are 


n, —1)s7 
{=з and 01 DS 
ni 


This follows directly from the results obtained in Section 10.7. 
(b) Using the-results of part (a), show that the likelihood ratio statistic can 


be written as 
n [s = р 
i=1 ni 


Hh E (n = t 


Ке] п 


(c) "IE — 8, si = 16, n; = 10, 52 = 25, n, = 6, sz = 12, m, = 8, and 
52 = 24 are the sample sizes and the variances of four independent 
random samples from four normal populations, use the result of part 
(b) to calculate —2 - In A and then test the null hypothesis stated at the 
beginning of this exercise. (Note that the number of degrees of freedom 
for this approximate chi-square test is 3, since o? = 03 = 03 = 04 
imposes 3 restrictions on the parameters.) 
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14. Show that for k — 2 the likelihood ratio statistic ‘of the preceding exercise 
can be expressed in terms of the ratio of the two sample variances and that 
the likelihood ratio test can, therefore, be based on the F distribution. 


15. If 15, 28, 3, 12, 42, 19, 20, 2, 25, 30, 62, 12, 18, 16, 44, 65, 33, 51, 4, and 28 
are the values of a random sample from an exponential population, use part 
(a) of Exercise 9 and Theorem 12.2 to test the null hypothesis that the mean 
of the population is 15 against the composite alternative that it is not equal 
to 15. Let a, the size of the critical region, be 0.05. 


46. When we test a simple null hypothesis against a composite alternative, a 
critical region is said to be unbiased if the corresponding power function 
takes on its minimum value at the value of the parameter assumed under the 
null hypothesis. In other words, a critical region is unbiased if the probability 
of rejecting the null hypothesis is least when the null hypothesis is true. 
Given a single observation of the random variable x having the density 


1+ 001 = х)  for0<x<1 
f(x) = 
0 elsewhere 


where —1 < 0 < 1, show that the critical region x < a provides an unbiased 
and uniformly most powerful critical region of size @ for testing the null 
hypothesis 0 = 0 against the alternative hypothesis 0 # 0. 
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13 


Hypothesis Testing: 
Applications 


INTRODUCTION 


In Chapter 12 we discussed some of the theory which underlies statistical tests, 
and in this chapter we shall present some of the standard tests that are most 
widely used in applications. Most of these tests, at least those based on known 
population distributions, can be obtained by the likelihood ratio technique. 

To explain the terminology which we shall use, let us consider a situation 
in which we want to test the null hypothesis Ho: 0 = 6, against the two-sided 
alternative hypothesis H,: 0 # б. Since it appears reasonable to accept the null 
hypothesis when our point estimate 0 of @ is close to 6, and to reject it when @ 
is much larger or much smaller than 65, it would be logical to let the critical 
region consist of both tails of the sampling distribution of our test statistic 6. 
Such a test is referred to as a two-tailed test. 

On the other hand, if we are testing the null hypothesis Hy: 0 = 6, against 
the one-sided alternative H,: 0 < 0, it would seem reasonable to reject Но only 
when @ is much smaller than 6). Therefore, in this case it would be logical to 
let the critical region consist only of the left tail of the sampling distribution of 
Ө. Likewise, in testing Hy: Ө = 6; against the one-sided alternative H,: 0 > 6, 
we reject Ho only for large values of д and the critical region consists only of 
the right tail of the sampling distribution of 6. Any test where the critical region 
consists only of one tail of the sampling distribution of the test statistic is called 
a one-tailed test. 


412 
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For instance, for the two-sided alternative i 
ШАН \ : ш * wo in Example 12.6 on page 
404, the likelihood ratio technique led to a two-tailed test with the critical region 


c 
|% - ud > htm 


or 
c c 
ŽS до faa сут and #2 pot mr’ 
n 


As is pictured in Figure 13.1, the null hypothesis p. = Ho is rejected if X takes 
ona value falling in either tail of its sampling distribution. In terms of the statistic 
z, the critical region can be stated in the form 2 < —2,/; OT Z > 24/2, where 


za X — Ho 
c/ n 
Reject Ho Accept Ho Reject Ho 
a/2 о/2 
x 
0 Ho pos 
Ho 7 20/277 Ho +2а/2 4/7 


Figure 13.1 Critical region for Hy: p * шо. 


Had we used the one-sided alternative ш > Ho» the likelihood ratio tech- 
nique would have led to the one-tailed test pictured in Figure 13.2, and if we 
had used the one-sided alternative ш < Ho, the likelihood ratio technique would 
have led to the one-tailed test pictured in Figure 13.3. It certainly stands to reason 
that in the first case we would reject the null hypothesis and accept the alternative 
only when X is large, namely, when it falls into the right tail of the sampling 
distribution, and that the opposite is true when the alternative hypothesis is 
ш < po. Although there аге exceptions to this rule (see Exercise 1 on page 421), 
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Ho 


о 
Ho *Za. Jr 


Figure 13.2 ‘Critical region for H,: и> що. 


usually lead to one-tailed tests. 
In the remainder of this chapter, the outline of each test procedure will 
consist of the following four steps: 


1. State the null hypothesis Н, and an appropriate alternative hypothesis H,. 
2. Using the sampling distribution of an appropriate test statistic, determine a 
critical region of size a, where a is specified. 


two-sided alternatives usually lead to two-tailed tests and one-sided alternatives 


Ho 


o 
Ho BL NT 


Figure 13.3 Critical region for Hy: ш « po: | 
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3. Compute the value of the test statistic from sample data. 


4. Decide whether to reject the null hypothesis, whether to accept it, or whether 
to reserve judgment. 


132 TESTS CONCERNING MEANS 


In this section we shall discuss the most commonly used tests concerning the 
mean of a population, and in Section 13.3 we shall discuss the corresponding 
tests concerning the means of two populations. Tests concerning the means of 
more than two populations will be taken up later in Chapter 15. All the tests in 
this section are based on normal distribution theory, assuming either that the 
Samples come from normal populations or that they are large enough to justify 
normal approximations; some nonparametric alternatives to these tests, which 
do not require knowledge about the population or populations from which the 
samples are obtained, will be taken up in Chapter 16. 

Suppose that we want to test the nul] hypothesis и = ро against one of the 
alternatives ш # шо, и > Ho, OF Ш < Mo ON the basis of a random sample of 
size n from a normal population with the known variance o^. This, of course, is 
the test that was considered in Example 12.6 to illustrate the likelihood ratio 
technique and the critical regions for the respective alternatives are |z| = 2,2, 
22 zy, and z € —z,,Where 


The most frequently used values of a, the probability of a type 1 error, are 0.05 
and 0.01, and as the reader was asked to show in Exercise 17 on page 237, the 
corresponding values of Za and z,/; are Zos = 1.645, тоу = 2.33, 2055 = 1.96, and 


Zoos = 2.575. 


EXAMPLE 13.1 


Suppose that it is known from experience that the standard: deviation of the 
weight of 8-ounce packages of cookies made by a certain bakery is 0.16 ounce. 
To check whether its production is under control on a given day, namely, to 
check whether the true average weight of the packages is 8 ounces, they select a 
random sample of 25 packages and find that their mean weight is X — 8.112 
ounces. Since the bakery stands to lose money when ш > 8 and the customer 
loses out when и « 8, test the null hypothesis и = 8 against the alternative 


ш # 8 using а = 0.01. 
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Solution 


1. Не 5478 
Н: #8 
2. Reject the null hypothesis if z < —2.575 or z > 2.575, where 


X — po 
c/ n 


3. Substituting X = 8.112, 4o = 8, с = 0.16, and n = 25, we get 


8.112 - 8 
0.16/V25 


= 3,50 


4. Since z = 3.5 exceeds 2.575, reject the null hypothesis and make suitable 
adjustments in the production process. A 


It should be noted that the critical region z > z, can also be used to test 
the null hypothesis ш = д against the simple alternative и = д, > po, or the 
composite null hypothesis и < jo against the composite alternative и > Ho. In 
the first case we would be testing a simple hypothesis against a simple alternative 
as in Section 12.4 (see Example 12.4 on page 394, where we studied this test for 
с = 1), and in the second case а would be the maximum probability of commit- 
ting a type I error for any value of ш assumed under the null hypothesis. Of 
course, similar arguments apply to the critical region z € —z,. 

When we are dealing with a large sample of size n > 30 from a population 
which need not be normal but has a finite variance, we can use the central limit 
theorem to justify using the test for normal populations, and even when c? is 
unknown we can approximate its value with s? in the computation of the test 
statistic. To illustrate the use of such an approximate large-sample test, consider 
the following example: 


EXAMPLE 13.2 


Suppose that 100 tires of a certain brand lasted on the average 21,431 miles with 
a standard deviation of 1,295 miles. Using @ = 0.05, test the null hypothesis 
ш = 22,000 miles against the alternative hypothesis и < 22,000. 


Solution 


l. Hy ш = 22,000 
Н: ш < 22,000 
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2. Reject the null hypothesis if z « —1.645, where 


_ X= Bo 


dg EU 


3. Substituting X = 21,431, wo = 22,000, с = 1,295, and n = 100, we get 


21,431 — 22,000 _ 
ідея КЫП К 


4. Sincez = —4.39is less than —1.645, the null hypothesis must be rejected; 
we conclude that the tires are not as good as claimed. A 


When n < 30 and o? is unknown, the test which we have been discussing 
in this section cannot be used. However, in Exercise 10 on page 409 we saw that 
for random samples from normal populations the likelihood ratio technique 
yields a corresponding test based on 


х. 


Pe — Mo 
s/Vn 


which, according to Theorem 8.12, is a value of a random variable having the t 
distribution with n — 1 degrees of freedom. Thus, critical regions of size a for 
testing the null hypothesis и = Ho against the alternatives ш # Шо, M > Mo, OF 


ш < po, are, respectively, [| = tain CR. fant and t € —1,,,. Note that 
the comments made on page 416 in connection with the alternative hypothesis 


pı > po and the test of the null hypothesis и = Ho against the alternative и > Ho 


apply also in this case. UM 
To illustrate this one-sample f test, as it is usually 


following example: 


called, consider the 


EXAMPLE 13.3 


Suppose that the specifications for a certain kind of ribbon call for a mean 


breaking strength of 185 pounds, and that five pieces randomly selected from 


different rolls have a mean breaking strength of 183.1 pounds with a standard 
look upon the data as a random 


deviation of 8.2 pounds. Assuming that we can 


sample from a normal population, test the null hypothesis и = 185 against the 


alternative hypothesis ш < 185 at a = 0.05. 
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Solution 
1. Hy. p = 185 
Ay: џ < 185 
2. Reject the null hypothesis if t = —2.132, where 


X= ро 


un s/Vn 


t 


and 2.132 is the value of to. 
3. Substituting X = 183.1, шо = 185, s = 82, and n = 5, we get 


183.1 — 185 

= = -0.49 
8.2/V5 

4. Since t = —0.49 is greater than —2.132, the null hypothesis cannot be 

rejected. If we have to go beyond this and say that the rolls of ribbon 

from which the sample was selected meet specifications, we are, of 

course, exposed to the unknown risk of committing a type II error. A 


13.3 TESTS CONCERNING DIFFERENCES 
BETWEEN MEANS 


In applied research, there are many problems in which we are interested in 
hypotheses concerning differences between the means of two populations. For 
instance, we may want to decide upon the basis of suitable samples whether men 
can perform a certain task as fast as women, or we may want to decide on 
the basis of an appropriate sample survey whether the average weekly food 
expenditures of families in one city exceed those of families in another city by 
at least $5.00. 

Let us suppose that we are dealing with independent random samples of 
size n, and n; from two normal populations having the means ш, and и; and 
the known variances o? and сз, and that we want to test the null hypothesis 
Ha ро = б, where ô is a given constant, against one of the alternatives ш, — 
M2 * б, hi — p2 > Ô, or py — u5 < 6. Applying the likelihood ratio technique, 
we will arrive at a test based on 3; — X;, and, referring to Exercise 4 on page 
281, we find that the respective critical regions can be written as |z| > 2,2, 
Z 7 Za, and z = —z,, where 
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When we deal with independent random samples from populations with 
unknown variances which may not even be normal, we can still use the test which 
we have just described with s, substituted for с, and s, substituted for o; so long 
as both samples are large enough for the central limit theorem to be invoked. 


EXAMPLE 13.4 


Suppose that the nicotine contents of two brands of cigarettes are being measured. 

If in an experiment fifty cigarettes of the first brand had an average nicotine 

content of x, — 2.61 milligrams with a standard deviation of s, — 0.12 milligram, 

while forty cigarettes of the second brand had an average nicotine content of 

X; = 2.38 milligrams with a standard deviation of s; = 0.14 milligram, test 

the null hypothesis ш, — ш = 0.2 against the alternative и, — 4; # 0.2, using 
= 0.05. 


Solution 
l Hy py ~ њ = 0.20 
Hy ga ш # 020 
2. Reject the null hypothesis if z « —1.96 or z 7 1.96, where 


X3-X,—6 
z= 

oi, о 

n n 


3. Substituting X, = 2.61, X; = 2.38, 6 = 0.20, s, = 0.12 for a, s; = 014 
for с, n, = 50, and n; = 40, we get 


_ 261—238 = 02 _ 108 


z= 
100.12)? i (0.14)? 
50 40 


4. Ѕіпсе 2 = 1.08 falls between —1.96 and 1.96, the null hypothesis cannot 
be rejected; either we accept the null hypothesis or we say that the 
difference between 2.61 — 2.38 — 0.23 and 0.20 is not significant, 
namely, that it is not large enough to reject the null hypothesis. A 


When n, and n; are small and ci and c; are unknown, the test which we 
have been discussing cannot be used. However, for independent random 
samples from two normal populations having the same unknown variance c", 
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the likelihood ratio technique yields a test based on 
&-x-6 
(od 
Енш. 
ENa m 
where 
› _ (т = Usi + (m - Ds 
s = 1—41 
п + п – 2 
From Section 11.3, we know that under the given assumptions and the null 
hypothesis шу — ш = ô, the above expression for t is a value of a random variable . 
having the t distribution with n, + n; — 2 degrees of freedom. Thus, the appropri- 
ate critical region of size a for testing the null hypothesis ш, — 44; = ô against 
the alternatives ду — uz # б, ш — ш > Ô, ог иу — ш < 6 under the given 
assumptions are, respectively, |t| 2 t,/2,,45, 2, 1 > fa, 5,-2, and t < — tan n2: 
To illustrate this two-sample t test, consider the following problem: 
EXAMPLE 13.5 


In the comparison of two kinds of paint, a consumer testing service finds that 
four one-gallon cans of one brand cover on the average 512 square feet with a 
standard deviation of 31 square feet, while four one-gallon cans of another brand 
cover on the average 492 square feet with a standard deviation of 26 square feet. 
Test the null hypothesis ш, — ш; = Oagainst the alternative hypothesis ш, — #2 * 
0 at the level of significance a = 0.05. Assume that the two populations are 
normal and have equal variances. 


Solution 
L Hy pi - и = 0 
H: gi — pr #0 
2. Reject the null hypothesis if t = —2.447 or t > 2.447, where 


x -—-%-6 
len 
EN mo п; 


t= 


and 2.447 is the value of to... 
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3. First calculating s,, we get 


Bey + 3026)? 
Sp a 4p n 28.609 


and then substituting its value together with X, — 512, X; = 492,8 = 0, 
and n, = n; = 4 into the formula for t, we obtain 


‚_ 12-42 
28.6093 + 1 


4. Since t = 0.99 falls between —2.447 and 2.447, the null hypothesis 
cannot be rejected. Even though the difference between the two sample 
means is fairly large, the samples are so small that the results are not 
conclusive; that is, the difference between the two sample means may 
well be due to chance. A 


= 0.99 


Ifthe assumption с = ois untenable, there are several alternative methods 
that can be used. A relatively simple one consists of randomly pairing the values 
obtained in the two samples and then looking upon their differences as a random 
sample of size п, ог n2, whichever is smaller, from a normal population which, 
under the null hypothesis, has the mean д = 8. Then we test this null hypothesis 
against the appropriate alternative by means of the methods of Section 13.2. This 
is a good reason for having n, = m, but there exist alternative techniques for 
handling the case where п, # n,—one of these, the Smith- Satterthwaite test, is 
referred to on page 445. 

So far we have limited our discussion to random samples that are indepen- 
dent, and the methods which we have introduced in this section cannot be used, 
for example, to decide on the basis of weights “before and after" whether a 
certain diet is really effective, or whether an observed difference between the 
average I.Q.’s of husbands and their wives is really significant. In both of these 
examples the samples are not independent because the data are actually paired. 
A common way of handling this kind of problem is to proceed as in the preceding 
paragraph, namely, to work with the differences between the paired measurements 
or observations. If n is large, we can then use the test described on page 415 to 
test the null hypothesis ш — #2 = 8 against the appropriate alternative, and if 
n is small, we can use the t test described on page 417, provided the differences 
can be looked upon as a random sample from a normal population. 


THEORETICAL EXERCISES 


1. Given a random sample of size n from a normal po 
variance c^, show that the null hypothesis ш = Ho 


pulation with the known 
can be tested against the 
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alternative ш X шо with the use of a one-tailed criterion based on the 
chi-square distribution. 


- Suppose that a random sample from a normal population with the known 


variance т? is to be used to test the null hypothesis и = jo against the 
alternative hypothesis и = ш, where ш, > po, and that the probabilities of 
type I and type II errors are to have the preassigned values a and B. Show 
that the required size of the sample is given by 


Ыы о?(2, + 28)? 
(ш — Ho)” 


Also use this formula to find n when с = 9, до = 15, д, = 20, а = 0.05, 
and B — 0.01. 


- Suppose that independent random samples of size п from two normal popula- 


tions with the known variances ø} and c3 are to be used to test the null 
hypothesis u, — ш = 6 against the alternative hypothesis ш, — 4; = ó', and 
that the probabilities of type I and type II errors are to have the preassigned 
values а and В. Show that the required size of the sample is given by 


M (ej + oi)(z, + z6)? 
(6 – à 


Also use this formula to find n when оу = 9, o, = 13, ô = 80, 5’ = 86, 
a — 0.01, and B — 0.01. 


APPLIED EXERCISES 


4. According to the norms established for a reading comprehension test, eighth 


graders should average 84.3 with a standard deviation of 8.6. If 45 randomly 
selected eighth graders from a certain school district averaged 87.8, test the 
null hypothesis u = 84.3 for that School district against the alternative 
hypothesis ш > 84.3, using а = 0.01. 


. The security department of a factory wants to know whether the true average 


time required by the night watchman to walk his round is 30 minutes. If, in 
à random sample of 32 rounds, the night watchman averaged 30.8 minutes 
with a standard deviation of 1.5 minutes, determine at a. — 0.01 whether this 
is sufficient evidence to reject the null hypothesis и = 30 minutes in favor 
of the alternative hypothesis и * 30 minutes. 


- In 12 test runs over a marked course, a newly designed motorboat averaged 


33.6 seconds with a standard deviation of 2.3 seconds. Assuming that it is 
reasonable to treat the data as a random sample from a normal population, 


10. 


11. 


12. 


13. 
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test the null hypothesis ш = 35 against the alternative ш < 35 at the level 
of significance a = 0.05. 


. Five measurements of the tar content of a certain kind of cigarette yielded 


14.5, 14.2, 14.4, 14.3, and 14.6 mg/cig. Show that for a = 0.05 the null 
hypothesis ш = 14.0 must be rejected in favor of the alternative hypothesis 
ш # 14.0. Assume that the data are a random sample from a normal popu- 
lation. 


Suppose that in the preceding exercise the first measurement is recorded 
incorrectly as 16.0 instead of 14.5. Show that this will reverse the result, and 
explain the apparent paradox that even though the difference between the 
sample mean and po has increased, it is no longer significant. 


With reference to Example 13.4, for what values of X, — X; would the null 
hypothesis have been rejected? Also find the probabilities of type II errors 
with the given criterion if (a) ш — pa = 0.12, (b) ш — pa = 0.16, (c) uà = 
рь = 0.24, and (d) pı — p2 = 0.28. 


A sample study was made of the number of business lunches that executives 
claim as deductible expenses per month. If 40 executives in the insurance 
industry averaged 9.1 such deductions with a standard deviation of 1.9 ina 
given month, while 50 bank executives averaged 8.0 with a standard deviation 
of 2.1, test the null hypothesis pı — M2 = 0 against the alternative hypothesis 
ш — Ш # Oat a = 0.05. 

Sample surveys conducted ina large county in 1950 and again in 1970 showed 
that in 1950 the average height of 400 ten-year-old boys was 53.2 inches with 
a standard deviation of 2.4 inches, while in 1970’ the average height of 500 
ten-year-old boys was 54.5 inches with a standard deviation of 2.5 inches. 
Test the null hypothesis ш — #2 = —0.5 against the alternative hypothesis 
pai — pa € —0.5 at the level of significance @ = 0.05. 

To find out whether the inhabitants of two South Pacific islands may be 
regarded as having the same racial ancestry, an anthropologist determines 
the cephalic indices of six adult males from each island, getting Х = 77.4, 
x, = 722, and the corresponding standard deviations s, = 3.3 and s; = 2.1. 
Use a = 0.01 to check whether the difference between the two sample means 
can reasonably be attributed to chance. Assume that the populations are 
normal and have the same variance. 

of one kind have a mean target error of X, — 98 feet 
with a standard deviation of s, = 18 feet ‘while 10 short-range rockets of 
another kind have a mean target error of X, = 76 feet with a standard 
deviation of s; = 15 feet, test the null hypothesis pı — #2 = 15 against the 
alternative hypothesis pı — #2 15. Let the size of the critical region be 
a = 0.05 and assume that the populations are normal and have the same 


If 8 short-range rockets 


variance. 
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group of 16 persons engaged in these exercises for one month and showed 
the following results: 


Weight Weight Weight Weight 
before after before after 
deuil y л ——__ 
211 198 172 166 
180 173 155 154 
171 172 185 181 
214 209 167 164 
182 179 203 201 
194 192 181 175 
160 161 245 233 
182 182 146 142 
ELE ДЫ ке ee 


Test the null hypothesis ш, — H2 = 0 against the alternative hypothesis m- 
A» > Oat the level of significance a = 0.05. Assume that the differences have 
a normal distribution. 


15. To determine the effectiveness of an industrial safety program, the following 


37. Use æ = 0.01 to test the null hypothesis that the safety program is not 


concerning the means. 
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The tests which we shall study in this section include a test of the null 
hypothesis that the variance of a normal population equals a given constant, and 
the likelihood ratio test of the equality of the variances of two normal populations 
(which was referred to in Exercise 14 on page 411). 

The first of these tests is essentially that of Exercise 12 on page 410. Given 
a random sample of size n from a normal population, we shall want to test the 
null hypothesis с? = c against one of the alternatives а? * оў, а? > оў, ог 
а? < оў, and, as the reader should have discovered in Exercise 12 on page 410, 
the likelihood ratio technique leads to a test based on 52, the value of the sample 
variance. Based on Theorem 8.10, we can thus write the critical regions for testing 
the null hypothesis against the two orie-sided alternatives as Y^ > Xan-1 and 
X? € Xi-an-1, Where 


st (п = 1)s? 


со 


So far as the two-sided alternative is concerned, we reject the null hypothesis if 
X? = Хал OF X? < Xi-a/2,n-1, and the size of all these critical regions is, of 
course, equal to a. 


EXAMPLE 13.6 


Suppose that the thickness of a part used in a semiconductor is its critical 
dimension and that measurements of the thickness of a random sample of 18 
such parts have the variance s? = 0.68, where the measurements are in 
thousandths of an inch. The process is considered to be under control if the 
variation of the thicknesses is given by a variance not greater than 0.36. Assuming 
that the measurements constitute a random sample from a normal population, 
test the null hypothesis o^ = 0.35 against the alternative hypothesis a? > 0.36 


at a = 0.05. 
Solution 
1. Hy а? = 0.36 
Hy: о? > 0.36 
2. Reject the null hypothesis if x? = 27.587, where 
(п = 1)s? 
А а 
i ci 


and 27.587 is the value of Xasan: 
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3. Substituting 5° = 0.68, сё = 0.36, and n = 18, we get 


17(0.68) 
? = ——— = 32111 
0.36 2 
4. Since y? = 32.11 exceeds 27.587, the null hypothesis must be rejected 
and the process used in the manufacture of the parts must be 
adjusted. A 


Note that if а had been 0.01 in Example 13.6, the null hypothesis could 
not have been rejected, since y^ — 32.11 does not exceed діл» = 33.409. This 
serves to indicate that the choice of a is something which should always be 
specified in advance, so that we will be spared the temptation of choosing a level 
of significance which happens to suit our purpose. 

In Exercise 14 on page 411 the reader was asked to show that the likelihood 
ratio statistic for testi .g the equality of the variances of two normal populations 
can be expressed in terms of the ratio of the two sample variances. Given 
independent random samples of size n, and n; from two normal populations 
with the variances стапа o3, we thus find from Theorem 8.14 that corresponding 
critical regions of size a for testing the null hypothesis с? = c3 against the 
one-sided alternatives сї o3 ого? < о? are, respectively, 


z Быу and z B. vim 


€ l5 
vlan 
Salih 


where F,,, ,,,., and F.5,-1,,-1 are as defined on page 293. The appropriate 

critical region for testing the null hypothesis against the two-sided alternative 
2 2; 

oi * 05 1 


51 DU A 2 
mco URP ifs; > 55 
$5 

and 
52 кар 
t2 Гарс ifs} < s3 
51 


Note that this test is based entirely on the right tail of the F distribution, which 


is made possible by the result of Exercise 16 on page 297, namely, by the fact 
that if the random variable x has an F distribution with v, and v, degrees of 


1 
freedom, then x has an F distribution with v; and v, degrees of freedom. 
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EXAMPLE 13.7 


In comparing the variability of the tensile strength of two kinds of structural 
steel, an experiment yielded the following results: n, = 13, s; = 19.2, n; = 16, 
and 52 = 3.5, where the units of measurement are 1,000 pounds per square inch. 
Assuming that the measurements constitute independent random samples from 
two normal populations, test the null hypothesis ст = c? against the alternative 
с? * с? at the a = 0.02 level of significance. 


Solution 


1. Hy 0? 02 
Н: о? #* 03 
y! 
2. Since s? > 52, reject the null hypothesis-if $ > 3.67, where 3.67 is the 
» 


value of Fo1,12,15+ 
3. Substituting s? = 19.2 and s3 = 3.5, we get 


2 

si 19.2 
Eee шад 
e335 


4. Since Е = 5.49 exceeds 3.67, the null hypothesis mus! be rejected; we 
conclude that the variability of the tensile strength of the two kinds of 
steel is not the same. A 


THEORETICAL EXERCISES 

1. Making use of the fact that the chi-square distribution can be approximated 
with a normal distribution when v, the number of degrees of freedom, is large, 
show that for large samples from normal populations 


2 
DS m Emm] 
5 со |l + 2, mem 


on of size a for testing the null hypothesis 
> ai. Also construct corresponding critical 
hesis against the alternatives a? < o$ and 


is an approximate critical regi 
o? = с? against the alternative а? 
regions for testing this null hypot 
о? # o2. (See Exercise 4 on page 295.) 

Exercise 5 on page 295, show that for large random 


2. Making use of the result of п л 
1 hypothesis a^ = со сап 


samples from normal populations, tests of the nul 
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be based on the statistic 
(= z 1) ES 
go 


which has approximately the standard normal distribution. 


APPLIED EXERCISES 


3. 


If nine determinations of the specific heat of iron had a standard deviation 
of 0.0086, test the null hypothesis that a = 0.0100 for a normal population 
of such determinations. Use the alternative hypothesis с < 0.0100 and the 
level of significance a — 0.05. 


. In a random sample, the weights of 24 Black Angus steers of a certain age 


have a standard deviation of 238 pounds. Assuming that the weights constitute 
а random sample from a normal population, test the null hypothesis с = 250 
pounds against the two-sided alternative с # 250 pounds at the level of 
significance a = 0.01. 


- In a random sample, the time which 30 women took to complete the written 


test for their driver's license had a variance of 6.4 minutes. Assuming that the 
population sampled is normal, test the null hypothesis o? — 8 against the 
alternative o? < 8 at the level of significance a = 0.05 using 

(a) the method described in the text; 

(b) the method of Exercise 2. 


- Test at the level of significance а = 0.02 whether it was reasonable to assume 


in Example 13.5 on page 420 that the two populations have equal variances. 


- Test at the level of significance @ = 0.10 whether it was reasonable to assume 


in Exercise 12 on page 423 that the two populations have equal variances. 


. The following are the scores obtained in a personality test by samples of nine 


married women and nine unmarried women: 


— 28.68. 77. 82. 63-80. 78 71 72 
Married 


КЭТ 677-74 ОТА NGA TL 71. 72 


Assuming that these data can be looked upon as independent random samples 
from two normal populations, test the null hypothesis o? = ø} against the 
one-sided alternative oj > a5 at the level of significance a = 0.05. (o? and 
с are, respectively, the variance of the scores of unmarried women and the 
variance of the scores of married women.) 
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135 TESTS CONCERNING PROPORTIONS 


If an outcome of an experiment is the number of votes which a candidate receives 
in a poll, the number of imperfections found in a piece of cloth, the number of 
children who are absent from school on a given day, ..., we refer to these data 
as count data. Appropriate models for the analysis of count data are the binomial 
distribution, the Poisson distribution, the multinomial distribution, and some of 
the other discrete distributions which we studied in Chapter 5. In this section we 
shall present one of the most common tests based on count data, namely, a test 
concerning the parameter 0 of the binomial distribution. 

The binomial parameter 6 is the probability of a success on an individual 
trial and, hence, the proportion of successes one can expect in the long run. 
Testing on the basis of a sample whether the true proportion of cures from a 
certain disease is 0.90 or whether the true proportion of defectives coming off 
an assembly line is 0.02 is, thus, equivalent to testing hypotheses about the 
parameter 0 of binomial populations. 

In Exercise 9 on page 396 the reader was asked to show that the most 
powerful critical region for testing the null hypothesis @ = 6 against the alterna- 
tive 0 = 6, < 00, where the 0 is the parameter of a binomial population, is based 
on the value of x, the number of “successes” obtained in n trials. When it comes 
to composite alternatives, the likelihood ratio technique also yields tests based 
on the observed number of successes (as we saw in Exercise 8 on page 409 for 
the special case where @ = 1). In fact, if we want to test the null hypothesis 
Ө = 6, against the one-sided alternative @ > 6o, the critical region of size a of 
the likelihood ratio criterion is 


where k, is the smallest integer for which 


k, 


i 


Ў byn 0) < a 


y 


and b(y; n, 0o) is the probability of getting y successes in n binomial trials when 
8 = 6,. The size of this critical region, as well as the ones which follow, is thus 


as close as possible to a without exceeding it. ] 
The corresponding critical region for testing the null hypothesis 6 = % 


against the one-sided alternative 0 < 60 is 


х= К, 


430 Chap. 13: Hypothesis Testing: Applications 


where К, is the largest integer for which 


> 


Ps 


b(y;n 60) = а 
0 


Ў 


and, finally, the critical region for testing the null hypothesis 0 = 6, against the 
two-sided alternative 0 # 6, is 


x >kan “or x =k). 


EXAMPLE 13.8 


If the number of successes in 20 trials of a binomial experiment is 5, test the null 
hypothesis 0 = 0.50 against the two-sided alternative 0 # 0.50 at the a = 0.05 
level of significance. 


Solution 
1. Hy: 0 = 0.50 
H, 0 7 0.50 
2. Reject the null hypothesis if x « 5 or x 2 15, where 5 and 15 are the 
values of k'o2s and ko; determined from Table I. 
3. The observed number of successes is x = 5. ` 


Since x — 5 falls into the critical region, the null hypothesis must be 
rejected; we conclude that @ # 0.50. A 


The tests which we have described require the use of a table of binomial 
probabilities, at least when п is small. For n < 20 we can use Table I at the end 
of this book, and for values of n up to 100 we can use the tables referred to on 
page 207. Otherwise, we make use of the normal approximation to the binomial 
distribution and treat 


nun X — n8 
Упө(1 — ө) 


as a value of a random variable having the standard normal distribution. For 
large n, we can thus test the null hypothesis 0 = 6, against the alternatives 
0 # 65, 0 > Oo, or Ө < 6, using, respectively, the critical regions |z| > Za/2, 
Z > Za, and Z < —z,, where 


Xx — n6, 
І ———————]1 
Упб(1 = 8o) 
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or 


(x + 2) - n6, 


ШИТ Л 


if we use the continuity correction introduced in Example 6.4 on page 229. We 
use the minus sign when x exeeds n6; and the plus sign when x is less than n65. 


EXAMPLE 13.9 


An oil company claims that at most 20 percent of al! automobile owners buy 
brand A gasoline. Test this claim at a^ = 0.01, if a random check indicates that 
58 of 200 automobile owners buy brand A gasoline. 


Solution 
1. Н: 0 = 0.20 
Hy: 0 > 0.20 
2. Reject the null hypothesis if z > 2.33, where (with the continuity 
correction) 


ni: +3) — пф 


5 v n6«(1 — б) 


3. Substituting x = 58, n = 200, and 6, = 0.20, we get 


57.5 — 40 


z =- = 3.09 
\/200(0.20)(0.80) 


sis must be rejected; we 


4. Since 2 = 3.09 exceeds 2.33, the null hypothe 
0% of all automobile 


conclude that brand A is bought by more than 2 
owners. A 


continuity correction in the preceding 


Note that if we had not used the 
d the conclusion would have been the 


example, we would have had 2 = 3.18 ап 
same. 


136 TESTS CONCERNING DIFFERENCES 
AMONG К PROPORTIONS 


which we must decide whether 


In applied research there are many problems in dye 
or percentages, are significant 


observed differences among sample proportions, 
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or whether they can be attributed to chance. For instance, if 6 percent of the 
frozen chicken in a sample from one supplier fails to meet certain standards and 
only 4 percent in a sample from another supplier fails, we may want to ‘de 
whether the difference between the two percentages is significant. Similarly, we 
may, want to judge on the basis of sample data whether the actual proportion of 
voters who favor a certain candidate is the same in four different cities. 

To indicate a general method for handling problems of this kind, suppose 
that х, х›,..., and x, are observed values of a set of independent random 
variables X;, X;,..., and x, having binomial distributions with the respective 
parameters n, and 6,, n; and 6;,..., and n, and 6,. If the п” are sufficiently 
large, we can approximate the distributions of the independent random variables 


with standard normal distributions, and, according to Theorem 8.6, we can then 
look upon 


A * (x — n6) 
ж = y H 
ii n6 (1 — 6) 


as a value of a random variable having the chi-square distribution with k degrees 
of freedom. To test the null hypothesis 0, = 0, = -+> = 0, = 6, (against the 
alternative that at least one of the 6’s does not equal б) we can thus use the 
critical region xz Жак where 


22 Gu то)? 


ОТЕУ) 


When 6, is not specified, that is, when we are interested only in the null 
hypothesis 0; = 0; = · ·· = 6,, we substitute for 0 the pooled estimate 


Б ur ere 
поиски ул 


and the critical region becomes y^ > x2, ,, where 


2 Ex пб) 
i=1 n6(1 - 6) 
The loss of one degree of freedom, namely, the change in the critical region from 


Хак tO X21, is due to the fact that an estimate is substituted for the unknown 
parameter Ө; a formal discussion of this is referred to on page 445. 
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5 Let us now present an alternative form of the chi-square statistic for this 
kind of test which, as we shall see in Section 13.7, lends itself more readily to 
other applications. Arranging the data as in the following table, 


Successes Failures 

T 
Sample 1 x m-» 
Sample 2 х п — Xj 
Sample k Xk nk — Xk 


let us refer to its entries as the observed cell frequencies fj, where the first subscript 
indicates the row and the second subscript indicates the column ofthis k x 2 table. 

Under the null hypothesis 0 = 6 = ::: = %& = 0, the expected cell 
frequencies for the first column are тб, for i = 1,2,..., k, and those for the 
second column are n;(1 — 60). When 0, is not known, we substitute for it, 
as before, the pooled estimate 6, and estimate the expected cell frequencies as 


en = 2 and en = n(1- 6) 


fori = 1,2,..., and k It will be left to the reader to show in Exercise 1 on page 


434 that the value of the chi-square statistic can thus be written as 


к а (fy - ey)? 
22 Келшы ШЕ 
A -LE е; 


EXAMPLE 13.10 


Determine, on the basis of the sample data shown in the following table, whether 
the true proportion of shoppers favoring detergent А over detergent B is the 
same in all three cities: 


Number favoring Number favoring 
detergent A detergent B 


232 | 168 400 


Los Angeles 


240 500 


San Diego 260 


400 


Fresno 


Use the 0.05 level of significance. 
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Solution 
1. Hy; 6, = 6, = b 
Ну: 901,0, and Ө; are not all equal 
2. Reject the null hypothesis if x^ > 5.991, where 


Aue rupe 
ied jel eij 


and 5.991 is the value of x55. 
3. Since the pooled estimate of @ is 


232 + 260 + 197 


9 = 400 + 500 + 400 


689 
= —— = 0. 
1,300 58 


the expected cell frequencies аге 
e, = 400(0.53) = 212 and е 400(0.47) = 188 


= 500(0.53) = 265 and e; = 500(0.47) = 235 
= 400(0.53) = 212 and e, = 400(0.47) = 188 


IH | 


^ 
e 
1 


^ 
5 
\ 


and substitution into the formula for y^ above yields 
а (232 — 212)? P (260 — 265)? A (197 — 212)? 
212 265 212 


(168 — 188)? (240 – 235) (203 — 188)? 
LM. р ————— 
188 235 188 


= 6.48 


4. Since y? = 6.48 exceeds 5.991, the null hypothesis must be rejected; in 
other words, the true proportions of shoppers favoring detergent A over 
detergent B in the three cities are not the same. A 


THEORETICAL EXERCISES 


1. Show that the two formulas for x^ on pages 432 and 433 are equivalent. 


2. Modify the criteria on pages 429 and 430 so that they can be used to test the 
null hypothesis А = Ao, where A is the parameter of the Poisson distribution, 
on the basis of n observations. ( Hint: Use the result of Example 7.15.) Also 
use Table II to find values corresponding to ko2; and k'ozs to test the null 
hypothesis А = 3.6 against the alternative A # 3.6 at a = 0.05 on the basis 
of five observations. 
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3. For k — 2, show that the x^ formula on page 433 can be written as 


2 (m + п) (пох, = nx)? 


Со тп(х + х) (т + m) — (xı + x)] 


4. Show that for К = 2 and large samples the null hypothesis 0; = 02, where 


Ө, and 8; are the parameters of two binomial populations, can also be tested 
by looking upon 


mius 
Pc m n 
^ ^ 1 
6(1—.8)| — + — 
n n 
h a Хуф 
where 0 = CAEN as a value of a random variable having the standard 
1 2 


normal distribution. ( Hint: Refer to Exercise 6 on page 281.) 


5. Show that the square of the expression given for z in Exercise 4 equals 


m C PLN nó). 
X3 PETAT AO 
icinj(1-— 6) 


so that the two tests are actually equivalent. 


APPLIED EXERCISES 
6. The null hypothesis @ = 0.45 is to be tested against the alternative 0 < 0.45 


ata = 0.05, where 0 is the parameter of a binomial population with n = 19. 
Use Table I to find k' and the probabilities of committing type II errors 
with this criterion when 0 = 0.35, 0 = 0.30, and 6 = 0.25. 


7. The null hypothesis @ = 0.25 is to be tested against the alternative 0 > 0.25 


ato = 0.01, where 6 is the parameter of a binomial population with n — 20. 
Use Table I to find ko, and the probabilities of committing type II errors 
with this criterion when 0 = 0.35, 0 = 0.40, and 6 = 0.45. 


8. The null hypothesis 0 = 0.70 is to be tested against the alternative 0 # 0.70 


at a = 0.05, where @ is the parameter of a binomial population with n = 18. 
Use Table I to find ko»; and Kos and the probabilities of committing type 
II errors with this criterion when @ = 0.60, 6 = 0.65, 0 = 0.75, and 0. = 0.80. 


9. The null hypothesis 0 = 0.40 is to be tested against the alternative 0 # 0.40 


at a = 0.01, where 6 is the parameter of a binomial population with n = 16. 
Use Table I to find Koos and Ко; and the probabilities of committing type 
II errors with this criterion when 9 = 0.30, 0 = 0.35, 8 = 0.45, and @ = 0,50. 
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10. 


11. 


12. 


13. 


14. 


15. 


16. 


In a random sample of 600 cars making a right turn at a certain intersection, 
157 pulled into the wrong lane. Use the level of significance @ = 0.05 to test 
the null hypothesis that the actual proportion of drivers who make this 
mistake (at the given intersection) is 0.30 against the alternative hypothesis 
that this figure is incorrect either way. 


The manufacturer of a spot remover claims that his product removes at least 
90 percent of all spots. What can we conclude about this claim at a = 0.05, 
if the spot remover removed only 174 of 200 spots chosen at random from 
spots on clothes brought to a dry cleaning establishment? 


If 74 of 250 persons who watched a certain television program in black and 
white and 92 of 250 persons who watched the same program in color 
remembered two hours later what products were advertised, test the null 
hypothesis that there is no difference between the corresponding population 
proportions at the level of significance @ = 0.01. 


To landscape its grounds, a bank purchased 400 tulip bulbs from one nursery 
and 200 from another. If 46 of the 400 bulbs from the first nursery failed to 
bloom while 18 of the 200 bulbs from the other nursey failed to bloom, test 
the null hypothesis that there is no difference between the corresponding 
population proportions at the level of significance a = 0.05. 


In a random sample of 200 persons who skipped breakfast, 82 reported that 
they experienced midmorning fatigue, and in a random sample of 400 persons 
who ate breakfast, 116 reported that they experienced midmorning fatigue. 
Use the method of Exercise 4 and the level of significance a = 0.05 to test 
the null hypothesis that there is no difference between the corresponding 
population proportions against the alternative that midmorning fatigue is 
more prevalent among persons who skip breakfast. 


If 26 of 200 tires of Brand A failed to last 20,000 miles, while the corresponding 
figures for 200 tires of Brands B, C, and D were 23, 15, and 32, test the null 
hypothesis that there is no difference in the quality of the four kinds of tires. 
Use a = 0.05. 


In a random sample of 250 persons with low incomes 155 are for a certain 
piece of legislation, while in random samples of 200 persons with average 
incomes and 150 persons with high incomes there are, respectively, 118 and 
87 who favor the legislation. Use a = 0.05 to test the null hypothesis that 


the proportion of persons favoring the legislation is the same for all three 
groups. 


137 rx c TABLES 


The method we shall describe in this section applies to two kinds of problems, 
which differ conceptually but are analyzed in the same way. In the first kind of 
problem we deal with samples from r multinomial populations, with each trial 
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permitting c possible outcomes. This would be the case, for instance, when 
persons interviewed in five different precincts are asked whether they are for a 
candidate, against her, or undecided. Here r — 5 and c — 3. 

It would also have been the case in Example 13.10 if each shopper had 
been asked whether he or she favors detergent А, detergent B, or does not care 
one way or the other. We might thus have obtained the results shown in the 
following 3 x 3 table: 


Number favoring Number favoring Number 


detergent A detergent B indifferent 
Los Angeles | 174 93 | 133 | 400 
San Diego | 196 124 180 500 
Fresno | 148 105 147 400 


The null hypothesis we would want to test in a problem like this is that we 
are sampling r identical multinomial populations. Symbolically, if 6; is the 
probability of the jth outcome for the ith population, we would want to test the 
null hypothesis 


bj = Oj = `` = Oy 


forj —.1,2;.-:,c The alternative hypothesis would be that 01, 92),---» and 6,, 
are not all equal for at least one value of j. 

In the preceding example we dealt with three samples, whose fixed sizes 
were given by the row totals, 400, 500, and 400; on the other hand, the columif 
totals were left to chance. In the other kind of problem where the method of this 
section applies, we are dealing with one sample and the row totals as well as the 
column totals are left to chance. 

To give an example, let us consider the following table obtained in a study 
of the relationship, if any, of the 1.Q.’s of persons who have gone through a large 
company’s job-training program and their subsequent performance on the job: 


Performance 
Poor Fair Good 
Below average | 67 64 25 156 
1.Q. Average 42 76 56 174 
Above average 10 | 23 | 37 70 


119 163 118 400 
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Here there is one sample of size 400, and the row totals as well as the column 
totals are left to chance. It is mainly in connection with problems like this that 
r X c tables are referred to as contingency tables. 

The null hypothesis we shall want to test by means of the table above is 
that the on the job performance of persons who have gone through the training 
program is independent of their 1.0. In general, if 6;; is the probability that an 
item will fall into the cell belonging to the ith row and the jth column, 6, is the 
probability that an item will fall into the ith row, and 6 , is the probability that 
an item will fall into the jth column, the null hypothesis we would want to test is 


for і = 1,2,...,r and j= 1,2,..., с. Correspondingly, the alternative hy- 
pothesis would be that 6, # 6; · 0; for at least one pair of values of i and j. 

Since the method by which we analyze an ғ х c table is the same regardless 
of whether we are dealing with r samples from multinomial populations with c 
different outcomes or one sample from a multinomial population with rc different 
outcomes, let us discuss it here with regard to the latter. In Exercise 2 on page 
442 the reader will be asked to parallel the work for the first kind of problem. 

In what follows, we shall denote the observed frequency for the cell in the 
ith row and the jth column by fij, the row totals by f; , the column totals by JT 
and the grand total, the sum of all the cell frequencies, by f. With this notation, 
we estimate the probabilities Ө; and 0, as 


Ki Íj 
0: == апа 6, == 
Ж RAT 
and under the null hypothesis of independence we get 
ej = (Ж gy ld opted 
: Js f 


for the expected frequency for the cell in the ith row and the jth column. Note 

that е, is thus obtained by multiplying the total of the row to which the cell belongs 

by the total of the column to which it belongs, and then dividing by the grand total. 
Once we have calculated the e;;, we base our decision on the value of 


(fj = еу) 


Pic 
=1 jl е) 


c 
Y 
i=1 j= 
and reject the null hypothesis if it exceeds д2, 1с. 

The number of degrees of freedom is (r — 1)(с — 1), and in connection 
with this let us make the following observation: Whenever expected cell frequen- 
cies in chi-square formulas are estimated on the basis of sample count datz, the 
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number of degrees of freedom is s — t — 1, where s is the number of terms in 
the summation and t is the number of independent parameters replaced by 
estimates. When testing for differences among k proportions with the chi-square 
statistic of Section 13.6, we had з = 2k and t = k, since we had to estimate the 
k parameters 6;, 02, ..., Өк, and the number of degrees of freedom was 2k — k — 
1 = k — 1. When testing for independence with an r x c contingency table, we 
have s = rc and t = r + c — 2, since the r parameters 0; and the c parameters 
0; are not all independent—their respective sums must equal 1. Thus, we get 
s—t—12rc-(r*e-2)-1- (r - 1)(с - 1). 

Since the test statistic which we have described has only approximately a 
chi-square distribution with (r — 1)(c — 1) degrees of freedom, it is customary 
to use this test only when none of the e;; is less than 5; sometimes this requires 
that we combine some of the cells with a corresponding loss in the number of 
degrees of freedom. 


EXAMPLE 13.11 


For the data shown in the following table, test for independence between a 
person's ability in mathematics and his or her interest in statistics. Use the 0.01 
level of significance: 


Ability in mathematics 
Low Average High 


Low 63 42 15 


Interest in 


Ane Average 58 61 31 
Statistics 


High 


Solution 


1. Ho: Ability in mathematics and interest in statistics are independent. 
H,: These two variables are not independent. 
2. Reject the null hypothesis if x? = 13.277, where 


r e (fy — ey 
x-X y ——— 


i=1 ј=1 eij 


and 13.277 is the value of Хола: 


120-7135 
3. The expected frequencies for the first row are JST = 45.0, 


M — 50.0, and 120 — 45.0 — 50.0 — 25.0, where we made use 
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of the fact that for each row or column the sum of the expected cell 
frequencies equals the sum of the corresponding observed frequencies 
(see Exercise 1 on page 442). Similarly, the expected frequencies for 
the second row are 56.25, 62.5, and 31.25, and those for the third row 
(all obtained by subtraction from the column totals) are 33.75, 37.5, 
and 18.75. Then, substitution into the formula for y? yields 


2 _ (63 — 45.0)" (42—500), — 4 29 = 1875) 
SERS 50.0 18.75 


= 32.14 
Since y? = 32.14 exceeds 13.277, the null hypothesis must be rejected; 


we conclude that there is a relationship between a person's ability in 
mathematics and his or her interest in statistics. A 


138 GOODNESS OF FIT 


The goodness-of-fit test considered here applies to situations in which we want 
to determine whether a set of data may be looked upon аз а random sample 
from a population having a given distribution. A second kind of "goodness of 
fit" which applies to the fitting of a curve to a set of paired data will be discussed 
in Chapter 14. To illustrate, suppose that we want to decide on the basis of the 
data (observed frequencies) shown in the following table whether the number of 
errors a compositor makes in setting galley of type is a random variable having 
a Poisson distribution: 


Poisson 
Number of Observed probabilities Expected 
errors frequencies with A = 3 frequencies 
fi [7 
0 18 0.0498 21.9 
1 53 0.1494 65.7 
2 103 0.2240 98.6 
3 107 0.2240 98.6 
4 82 0.1680 73.9 
5 46 0.1008 44.0 
6 18 0.0504 22.2 
7 10 0.0216 9.5 
8 2 0.0081 3.6 
9 ifs 0.0038 | A 
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To determine a corresponding set of expected frequencies for a random 
sample from a Poisson population, we first use the mean of the observed distribu- 
tion to estimate the Poisson parameter A, getting = = = 3.05 or, approxi- 
mately, A = 3. Then, copying the Poisson probabilities for A = 3 from Table II 
(with the probability of 9 or more used instead of the probability of 9) and 
multiplying by 440, the total frequency, we get the expected frequencies shown 
in the right-hand column of the table. To test the null hypothesis that the observed 
frequencies constitute a random sample from a Poisson population, we must 
judge how good a fit, or how close an agreement, we have between the two sets 
of frequencies. In general, to test the null hypothesis Но that a set of observed 
data comes from a population having a specified distribution against the alterna- 
tive that the population has some other distribution, we compute 


еў 


X 
i=1 €i 


! 


and reject Но at the level of significance о if X? = Xl» Where m is the 
number of terms in the summation and t is the number of independent parameters 
estimated on the basis of the sample data (see discussion page 439). In the above 
illustration, t — 1 since only one parameter is estimated on the basis of the data, 


and the number of degrees of freedom is m — 2. 


EXAMPLE 13.12 


For the data in the table on page 440, test at the 0.05 level of significance whether 
the number of errors the compositor makes in setting a galley of type is a random 
variable having a Poisson distribution. 


Solution 


(Since the expected frequencies corresponding to 8 and 9 errors are less 
than 5, the two classes are combined.) 
1. Hy Random variable has a Poisson distribution. 

H,: Random variable does not have a Poisson distribution. 


2. Reject the null hypothesis if x? = 14.067, where 


m = 2 
2 ru = 


and 14.067 is the value of Хозл- 
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3. Substituting into the formula for x^, we get 


2 m 2 Е 2 
2 (18 — 21.9) + (53 — 65.7) НЕ (3 — 5.3) 
21.9 65.7 5.3 


= 6.86 


4. Since y? = 6.86 does not exceed 14.067, the null hypothesis cannot 
be rejected; indeed, the close agreement between the observed and 
expected frequencies suggests that the Poisson distribution provides a 
“good fit." A 


THEORETICAL EXERCISES 


1. Verify that if the expected cell frequencies are calculated in accordance with 


the rule on page 438, their sum for any row or column equals the sum of the 
corresponding observed frequencies. 


- Show that the rule on page 438 for calculating the expected cell frequencies 


applies also when we test the null hypothesis that we are sampling r popula- 
tions with identical multinomial distributions. 


. Verify that the following computing formula for y^ is equivalent to the formula 


given on page 438: 


. Use the formula of the preceding exercise to recalculate y^ for Example 13.10. 
. If the analysis of a contingency table shows that there is a relationship 


between the two variables under consideration, the strength of this relation- 
ship may be measured by means of the contingency coefficient 


where x^ is the value obtained for the test statistic and f is the grand total 
as defined on page 438. Show that 


(a) fora2 х 2 contingency table the maximum value of C is 12; 
(b) fora3x3 contingency table the maximum value of C is 1/6. 


APPLIED EXERCISES 


6. In a study of parents’ feelings about a required course in sex education, a 


random sample of 360 parents are classified according to whether they have 
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я gi 
one, two, or three or more children in the school system, and also whether 
they feel that the course is poor, adequate, or good. Based on the results 
shown in the following table, test at the 0.05 level of significance whether 
there is a relationship between parents’ reaction to the course and the number 


of children they have in the school system: 


Number of children 
1 2 3 or more 
Poor 
Adequate 
Good 


7. Tests of the fidelity and the selectivity of 190 radios produced the results 
shown in the following table: 


Fidelity 
Low Average High 


Low 


Selectivity Average 


High 


Use the 0.01 level of significance to test the null hypothesis that fidelity is 


independent of selectivity. 


8. The following sample data pertain to the s 
from three different vendors: 


hipments received by a large firm 


Number Number imperfect Number 
rejected but acceptable perfect 
Vendor A 12 23 89 
Vendor B 8 12 62 
21 30 19 


Vendor C 
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Test at the 0.01 level of significance whether the three vendors ship products 
of equal quality. 

9. Analyze the 3 x 3 table on page 437 which pertains to the responses of 
shoppers in three cities with regard to two detergents. Use the 0.05 level of 
significance. 

10. Four coins are tossed 160 times and 0, 1, 2, 3, or 4 heads showed, respectively, 
19, 54, 58, 23, and 6 times. Use the 0.05 level of significance to test whether 
it is reasonable to suppose that the coins are balanced and randomly tossed. 


11. It is desired to test whether the number of gamma rays emitted per second 
by a certain radioactive substance is a random variable having the Poisson 
distribution with A = 2.4. Use the following data obtained for 300 one-second 
intervals to test this null hypothesis at the 0.05 level of significance: 


Number оў 
gamma rays Frequency 
0 19 
1 48 
2 66 
3 74 
4 44 
5 35 
6 10 
7 ог тоге 4 


12. Each day, Monday through Saturday, a baker bakes three large chocolate 
cakes, and those not sold on the same day are given away to a food bank. 
Usethe data shown in the following table to test at the 0.05 level of significance 
whether they may be looked upon as values of a binomial random variable: 


Number of Number of 


cakes sold days 
0 1 
1 16 
2 55 
3 228 


13. The following is the distribution of the readings obtained with a Geiger 
counter of the number of particles emitted by a radioactive substance in 100 
successive 40-second intervals: 
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Number of 

particles Frequency 
5-9 1 
10-14 10 
15-19 37 
20-24 36 
25-29 13 
30-34 2 
35-39 1 


(a) Verify that the mean and the standard deviation of this distribution are, 
respectively, X — 20 and s = 5. 

(b) Find the probabilities that a random variable having a normal distribu- 
tion with и = 20 апіс = 5 will take on a value less than 9.5, between 
9.5 and 14.5, between 14.5 and 19.5, between 19.5 and 24.5, between 
24.5 and 29.5, between 29.5 and 34.5, and greater than 34.5. 

(c) Find the expected normal curve frequencies for the various classes by 
multiplying the probabilities obtained in part (b) by the total frequency, 
and then test at the 0.05 level of significance whether the data may be 
looked upon as a random sample from a normal population. 
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Regression 
and Correlation 


INTRODUCTION 


A major objective of many statistical investigations is to establish relationships 
which make it possible to predict one or more variables in terms of others. Thus, 
studies are made to predict the potential sales of a new product in terms of its 
price, a patient's weight in terms of the number of weeks he or she has been on 
a diet, family expenditures on entertainment in terms of family income, the per 
capita consumption of certain foods in terms of their nutritional values and the 
amount of money spent advertising them on television, and so forth. 

Although it is, of course, desirable to be able to predict one quantity exactly 
in terms of others, this is seldom possible, and in most instances we have to be 
satisfied with predicting averages or expected values. Thus, we may not be able 
to predict exactly how much money Mr. Brown will make ten years after graduat- 
ing from college, but, given suitable data, we can predict the average income of 
à college graduate in terms of the number of years he has been out of college. 
Similarly, we can at best predict the average yield of a given variety of wheat in 
terms of data on the rainfall in July, and we can at best predict the average 
performance of students starting college in terms of their I.Q.'s. 

Formally, if we are given the joint distribution of two random variables x 
and y and x is known to take on the value x, the basic problem of bivariate 
regression is that of determining the conditional mean ш, namely, the “average” 
value of y for the given value of x. [The term gression,” as it is used here, 


s kare ee LE 
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dates back to Francis Galton, who employed it first in connection with a study 
of the heights of fathers and sons, in which he observed a regression (a “turning 
back") from the heights of sons to the heights of their fathers.] In problems 
involving more than two random variables, that is, in multiple regression, we are 
correspondingly concerned with quantities such as шш, the mean value of z for 
given values of x and y, Js, xis: the mean value of x, for given values of xi, 
x2, and хз, and so on. 

If f(x, y) is the value of the joint density of two random variables x and y 
at (x, y), the problem of bivariate regression is simply that of determining the 
conditional density of y given x = x and then evaluating the integral 


y: w(y|x) dy 


uj. = Elylx) = | 


as outlined in Section 4.8. The resulting equation is called the regression equation 
of y on x. Alternatively, we might be interested in the regression equation 


рау = Е(х|у) = I x: f(x|y) dx 


In the discrete case, where we are dealing with probability distributions instead 
of probability densities, the integrals of the two preceding regression equations 
are simply replaced by sums. 

When we do not know the joint density of the two random variables, or at 
least not all of its parameters, the determination of рух and шыу becomes а 
problem of estimation based on sample data; this is an entirely different problem, 
which we shall discuss in Sections 14.3 and 14.4. 


EXAMPLE 14.1 
Given the random variables x and y which have the joint density 


Ms ext») ë forx > Oandy > 0 
fo) = 1o elsewhere 


find the regression equation of y on x and sketch the regression curve. 


Solution 
Integrating out y we find that the marginal density of x is given by 


-x 


E |, forx > 0 
ER UO elsewhere 


к М 
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and, hence, the conditional density of y given x — x is given by 


—x(1+y) 


for y > бапа w(y|x) = 0 elsewhere, which we recognize as an exponential 


1 
density with 0 = p Hence, by evaluating 


0 
or by referring іо Corollary 1 of Theorem 6.3, we find that the regression 
equation of y on x is given by 


Шух = 


x |= 


The corresponding regression curve is shown in Figure 14.1. A 


x 


Figure 14.1 Regression curve of Example 14.1. 


EXAMPLE 14.2 


If x and y have the multinomial distribution 


fs y) = ( ) * OF O3(1 — 0, – 6)" 


»»п-х-у 
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for x = 0, 1, 2,...,n, and y = 0, 1, 2,...,n, subject to the restriction that 
x + y < n, find the regression equation of y on x. 


Solution 


The marginal distribution of x is given by 


( ) YN n 
2 : 602(1 — 0, — 6)» 
Р p And pem ) br in 


("ва - 6)" 


for x = 0, 1, 2,..., n, which we recognize as a binomial distribution with 
the parameters n and 0,. Hence, 


iL eene «i Dra 
( б Jaa Da ay? 
wlx) = = a 


for y = 0, 1, 2,..., n — x, and, rewriting this formula as 


es Ит) (ase) 


we find by inspection that the conditional distribution of y given x — x is 
9; 


a binomial distribution with the parameters n — x and 


, so that th: 
в, so that the 


regression equation of y on x is 


according to Theorem 5.2. A 


With reference to the preceding example, if we let x be the number of times 
an even number comes up in 30 rolls of a balanced die, and y the number of 
times the result is a five; then the regression equation becomes 


This stands to reason, because there are three equally likely possibilities, 1, 3, or 
5, for each of the 30 — x outcomes that are not even. 
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EXAMPLE 14.3 
If the joint density of x;, x;, and х; is given by 


K ) pe es for0 < x, <1,0< x,<1,x,>0 
x E: 
107206 0 elsewhere 


find the regression equation of x; on x, and x;. 


Solution 


Referring to Example 3.22 on page 121, we find that the joint marginal 
density of x, and x; is given by 


(х +)e* for0 < x, < 1, х; > 0 
m(x, X3) = 
0 elsewhere 
Therefore, 
a 1 
/(х\, X2, ху) | X(x; + х) 
Qm] xn ax, | SE dx, 
roca l2 А m(x,, xi) з o (х+%) х2 
x +3 
„чү д 
2x, +21 


Note that the conditional expectation obtained in the preceding example 
depends on x, but not оп хз. This could have been expected since we indicated 
on page 127 that there is a pairwise independence between x, and x;. 


142 LINEAR REGRESSION 


An important feature of Example 14.2 is that the regression equation is linear, 
namely, that it is of the form 


шук = «c Bx 


where а and 8 are constants, called the regression coefficients. There are several 
reasons why linear regression equations are of special interest: First, they lend 
themselves readily to further mathematical treatment; then, they often provide 
good approximations to otherwise complicated regression equations; and, finally, 
in the case of the bivariate normal distribution, which we studied in Section 6.7, 
the regression equations are, in fact, linear. 
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To simplify the study of linear regression equations, let us express the 
regression coefficients а and f in terms of some of the lower moments of the 
joint distribution of x and y, namely, in terms of E(x) = ui; E(y) = us, var(x) = 
a1, var(y) = a$, and cov(x, y) = су. Then, also using the correlation coefficient 
2012 


9102 


defined in Section 6.7, we can prove the following results: 


THEOREM 141 If the regression of y on x is linear, then 
22 
ух = Ba + p— (x — ш) 
Tı 
and if the regression of x on y is linear, then 


21 
Hy = Ba + р—(у — m) 
с; Х 


Proof. Since цу; = а + Вх, it follows that 
[> w(y|x) dy = a + Bx 


and if we multiply the expression on both sides of this equation by g(x), 
the corresponding value of the marginal density of x, and integrate on x, 


we obtain 
[| y: w(y|x)g(x) dy dx = а few dx * B | х g(x) dx 


or 
ш = а + Bur 


since w(y|x)g(x) = f(x y). If we had multiplied the equation for шу on 
both sides by x : g(x) before integrating on x, we would have obtained 


[| xy ` f(x, y) dy dx = о ES g(x) dx * B f x? + g(x) dx 


E(xy) = аш + BE (x°) 


1 
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Solving и; = а + Bu, and E(xy) = аш + BE x^) for a and B and 
making use of the fact that E(xy) = оу: + pipz and E(x") = от + ui, we 


find that 
212 ES a 
LS ETT» RU TOT у p ae 
Lo: z 17 H2 о; 1 
and 
B= ОТО? 
сі Po, 


This enables us to write the linear regression equation of y on x as 


9 
Шух = Ш› + р(х — ш) 
о, 


When the regression of x on y is linear, similar steps lead to the equation 
a 
шу = pa + pay = ш) M 


It follows from Theorem 14.1 that if the regression equation is linear and 
p = 0, then ду, does not depend on x (or x), does not depend on y). When 
р = 0 and, hence, о; = 0, the two random variables x and y are uncorrelated, 
and we can paraphrase the assertion which we made on page 164 by saying that 
if two random variables are independent they are also uncorrelated, but if two 
random variables are uncorrelated they are not necessarily independent; the latter 
is again illustrated in Exercise 9 on page 458. 

The correlation. coefficient and its estimates are of importance in many 
statistical investigations, and they will be discussed in some detail in Section 
14.5. At this time, let us again point out that —1 < p < +1, as the reader will 
be asked to prove in Exercise 11 on page 459, and that the sign of p tells us 
directly whether the slope of a regression line is upward or downward. 


143 THE METHOD OF LEAST SQUARES 


In the preceding sections we have discussed the problem of regression only in 
connection with random variables having known joint distributions. In actual 
practice, there are many problems where a set of paired data gives the indication 
that the regression is linear, where we do not know the joint distribution of the 
random variables under consideration, but nevertheless want to estimate the 
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regression coefficients a and f. Problems of this kind are usually handled by the 
method of least squares, a method of curve fitting suggested early in the nineteenth 
century by the French mathematician Adrien Legendre. 

To illustrate this technique, let us consider the following data on the number 
of hours which ten persons studied for a French test and their scores on the test: 


Hours studied Test score 
x y 
4 31 
9 58 

10 65 
14 73 
4 37 
7 44 
12 60 
22 91 
1 21 
17 84 


Plotting these data as in Figure 14.2, we get the impression that a straight line 
provides a reasonably good fit. Although the points do not all fall on a straight 
line, the overall pattern suggests that the average test score for a given number 
of hours studied may well be related to the number of hours studied by means 
of an equation of the form uy, = a + Bx. 

Once we have decided in a given problem that the regression is approxi- 
mately linear, we face the problem of estimating the coefficients a and 8 from 
the sample data. In other words, we face the problem of obtaining estimates à 
and B such that the estimated regression line ў = @ + Bx in some sense provides 
the best possible fit to the given data. 

Denoting the vertical deviation from a point to the line by e, as indicated 
in Figure 14.3, the least squares criterion on which we shall base this "goodness 
of fit" requires that we minimize the sum of the squares of these deviations. Thus, 
if we are given a set of paired data {(x;, y); = 1,2,..., n}, the least squares 
estimates of the regression coefficients are the values & and f for which the 
quantity 

п n " 
а= еты Вх)р 
iz fe 
is a minimum. Differentiating partially with respect to & and Ê, and equating 
these partial derivatives to zero, we obtain 


94 $ (-2)[y - (ê + Bx] = 0 


А 
да i=l 
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Test score 
"vw 
N 


: Ri UE Big 1 1х 
0 5 10 15 20 25 


Hours studied 


Figure 14.2 Data on test scores and number of hours studied. 


and 


d m 


X C2x[», - (ê + Bx] = 0 
op і=1 


which yield the so-called system of normal equations | 


án * B. Y x 


1 і=1 


Ж ылы 
= 
Il 


Yx»n-4 Sut Sx 
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Figure 14.3 Least squares criterion. 


Solving this system 


of equations by using determinants or the method of 


elimination, we find that the least squares estimate of B is 


(E) GGG) 


^ 
а= 


В B 


n 
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by solving the first of the two normal equations for &. This formula for d can 
also be written as 


&-y-B-X 


To simplify the formula for B as well as some of the formulas we shall 
meet in Sections 14.4 and 14.5, let us introduce the following notation: 


Su = E (4-32 = 5-45 x) 


and 


Sy = M (xus X чу — X (E ») 


We can thus write 


THEOREM 142 Given the sample data {(x, у); і = 1, 2,...,п), the 
coefficients of the least squares line ў = à + Bx are 


Ё = 5з 
S 


and 


EXAMPLE 14.4 


With reference to the data on page 453, find the equation of the least squares 
line that approximates the regression of test scores on the number of hours studied. 


Solution 


We find that n = 10, Ух = 100, Y x? = 1,376, Y у = 564, and У ху = 
6,945, so that 


S, = 1,376 — 15(100)? = 376 
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and 
S, = 6,945 — 10(100)(564) = 1,305. 
^ 1,305 564 100 
Thus, B == = 3.471 à -—-—341:—-2L 
B 376 and à 10 3.471 10 21.69, and the 


equation of the least squares line is 


$-22169-3471x А 


With the equation obtained in the preceding example, we can predict, for 
instance, that a person who studies 14 hours for the test will get a score of 
21.69 + 3.471(14) = 70.284 (or 70 rounded to the nearest unit). Since we did not 
make any assumptions about the joint distribution of the random variables with 
which we are concerned, we cannot judge the "goodness" of this prediction; 
also, we cannot judge the “goodness” of the estimates @ = 21.69 and B = 3.471. 
This kind of problem will be discussed in Section 14.4. 

The least squares criterion, or in other words, the method of least squares, 
is used in many problems of curve fitting which are more general than the one 
treated in this section. Above all, it will be used in Sections 14.6 and 14.7 to 
estimate the coefficients of multiple regression equations of the form 


Ha cies Bia + с + Bie 


THEORETICAL EXERCISES 


1. With reference to Example 14.1, show that the regression equation of x on 


y is 
syle 
ху 1+у 
and sketch the regression curve. 
2. Given the joint density 
32x + 3y) ford < х <land0<y<1 
f(x») = 0 elsewhere 


find шуу and pyly- 
3. Given the joint density 


- [6 fr0<x<y<1 
fos yy7 0 elsewhere 


find рух and рыу: 
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4. Given the joint density 


2x 


f(xy) = 4 + x + xy)? 
0 elsewhere 


forx > Qand y > 0 


1 
show that py. = 1 + — and that var(y|x) does not exist. 
2% 


5. With reference to Exercise 2 on page 128, use the results of parts (c) and (d) 
to find px), and шуу. 


6. With reference to Exercise 3 on page 129, find an expression for pyx- 
7. Given the joint density 


2 for0<y<x<1 
fy) = lo elsewhere 
show that 
x 1+у 
(a) yx = 2 and шу = Ж 
2 


O) FOO) Ge Dom + n+ 2)" 


Also, verify the results of part (a) by substituting the values of c, 72, and 
p, obtained with the formula of part (b), into the formulas of Theorem 14.1. 


8. Given the joint density 


eee ie Ғогх > 0,y > 0,andx+y <1 
10 elsewhere 


show that jy, = 3(1 — x) and verify this result by determining the values 


of c, сз, and p, and substituting them into the appropriate formula of 
Theorem 14.1. 


9. Given the joint density 


л») = |} for-y < х < yandü < y <1 
0 elsewhere 


show that the random variables x and y are uncorrelated but not independent. 


10. 
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Show that if рих is linear in x and var(y|x) is constant, then var(y|x) = 
с3(1 — p^). 


11. Given a pair of random variables x and y having the variances сї and аҳ 


12. 


13. 


14. 


5 n x 
and the correlation coefficient p, use Theorem 4.14 to express var( А. Y) 
оү 02 


х ^ 4 
and var( * = x) in terms of тү, 02, and p. Then, making use of the fact 
H 2. 


that variances cannot be negative, show that -1 < p € +1. 
Given the random variables X; , X2, and x, having the joint density f (x1, Х2, ху), 


show that if the regression of x; on x, and х; is linear and written as 


аа А Bila — ш) + Вә(х›— ш) 


then 
a= из 


01305 — 012023 
oio — F12 
2 

- 7501 — 012013 
5 с03 — on 
where ш = E(x), а? = var(X)), and оу = cov(x;, xj). [ Hint: Proceed as on 
page 451, multiplying by (xi — ш) and (x2 — ра), respectively, to obtain the 
second and third equations.] 
Find the least squares estimate of the parameter B in the regression equation 


рух = Вх. 
By solving the normal equations on page 454 simultaneously, show that 


(ili) - Go) 
(=) - (9) 


^ 
a= 


APPLIED EXERCISES 


15. Various doses of a poison were given to 


groups of 25,mice and the following 


results were observed: 
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Dose (mg) | Number of deaths 


x У 
4 1 
6 3 
8 6 
10 8 
12 14 
14 16 
16 20 
18 21 


(a) Find the equation of the least squares line fit to these data. 
(b) Estimate the number of deaths in a group of 25 mice who receive a 
7-mg dose of this poison. 


16. The following are the grades which 12 students obtained in the mid-term and 
final examinations in a course in statistics: 


Mid-term Final 
examination examination 
x y 
71 83 
49 62 
80 76 
73 77 
93 89 
85 74 
58 48 
82 78 
64 76 
32 51 
87 73 
80 89 


(a) Find the equation of the least Squares line which will enable us to 
predict a student's final examination grade in this course on the basis 
of his/her mid-term grade. 


(b) Predict the final examination grade of a student who received an 84 on 
the mid-term examination. 


17. Raw material used in the production of a synthetic fiber is stored in a place 
which has no humidity control. Measurements of the relative humidity in the 
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storage place and the moisture content of a sample of the raw material (both 
in percentages) on 12 days yield the following results: 


Moisture 
Humidity content 
46 12 
53 14 
37 11 
42 13 
34 10 
29 8 
60 17 
44 12 
41 10 
48 15 
33 9 
40 13 


ЕЕ 


(a) Fita least squares line from which we can predict the moisture content 


in terms of the humidity. 
(b) Use the result of part (a) to estimate the moisture content when the 


relative humidity is 38 percent. 


18. The following data pertain to the chlorine residual in a swimming pool at 
various times after it has been treated with chemicals: 


Number of Chlorine residual 


hours. (parts per million) 
2 1.8 
4 1.5 
6 1.4 
8 1.1 
10 1.1 
12 0.9 


(a) Fita least squares line from which we can predict the chlorine residual 
in terms of the number of hours since the pool has been treated with 
chemicals. 

(b) Use the equation of the least squares line to estimate the chlorine 
residual in the pool five hours after it has been treated with chemicals. 
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19. When the x's are equally spaced, the calculation of and B can be simplified 


21. 


by coding the x’s by assigning them the values ..., —3, —2, —1, 0, 1, 2, 3,... 
when п is odd, or the values ..., —5, —3, —1, L 3, /5,... when n is even. 


(a) Show that with this ice the formulas for à ae B become 


(b) Use this kind of coding to rework both parts of the preceding exercise. 


. During its first five years of operation, a company’s gross income from sales 
„was 1.4, 2.1, 2.6, 3.5, and 3.7 million dollars. Use the coding of the preceding 


exercise to fit a least squares line and, assuming that the trend continues, 
predict the company’s gross income from sales during its sixth year of 
operation. 


If a set of paired data gives the indication that the regression equation is of 
the form uy, = а · В“, iti is customary to estimate a and B by fitting the line 
log ў = logg å + х: log Ê to the points {(x;,log yj); i = 1,2,..., n) by the 
method of least squares. Use this technique to fit an exponential curve of 
the form ў = â- B* to the following data on the growth of cactus grafts 
under controlled environmental conditions: 


Weeks after Height 
grafting (inches) 
x y 
1 2.0 
2 2.4 
4 5 
5 73 
6 9.4 
8 18.3 


. If a set of paired data gives the indication that the regression equation is of 


the form ду, = a + x, it is customary to estimate a and В by Sag the line 
log ў = log Â + B - log x to the points {(log x, log y;); i = 1,2,...,n) by 
the method of least squares. 
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(a) Use this technique to fit a power function of the form ў = 6: x to 
the following data on the unit cost of producing certain electronic 
components and the number of units produced: 


Lot size Unit cost 
* y 
50 $108 
100 $53 
250 $24 
500 $9 
1,000 $5 


(b) Use the result of part (a) to estimate the unit cost for a lot of 300 
components. 


14.4 NORMAL REGRESSION ANALYSIS 


When we analyze a set of paired data {xn у); i = 1,2,...› n} by regression 
analysis, we assume that the x, are constants while the y; are values of correspond- 
ing independent random variables y;. This clearly differs from correlation analysis, 
which we shall take up in Section 14.5, where the x, and the y, are values of 
corresponding random variables x, and y;. For example, if we want to analyze 
data on the ages and prices of used cars, treating the ages as known constants 
and the prices as values of random variables, this is a problem of regression 
analysis. On the other hand, if we want to analyze data on the height and weight 
of certain animals, and height and weight are both looked upon as random 
variables, this is a problem of correlation analysis. 

This section will be devoted to some of the basic problems of normal 
where it is assumed that for each fixed x; the conditional 


regression analysis, 
is the normal density 


density of the corresponding random variable y; 


pe 
se 2 а -0< у; < %0 


1 
ох) “ст 


where a, B, and с are the same for each i. Given a random sample of such paired 
data, normal regression analysis concerns itself mainly with the estimation of a 
and the regression coefficients а and B, with tests of hypotheses concerning these 
three parameters, and with predictions based on the estimated regression equation 
ysát Bx, where & and В are estimates of а and f. 
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To obtain maximum likelihood estimates of the parameters a, B, and о, we 
partially differentiate the likelihood function (or its logarithm, which is easier) 
with respect to a, B, and с, equate the expressions to zero, and then solve the 
resulting system of equations. Thus, differentiating 


Та : 
mL nes hime 2» D (a + px)? 
2 20 11 


partially with respect to a, 8, and c, and equating the expressions which we 
obtain to zero, we get 


~ 


ain MUS 


Ja SE у Ly — (a + Bx] = 0 

діа L 1 5 

» = 25 D abi (а + 8җ)]=0 

д1п 1. 1 2 : 
oe a tg à (а + ах) = 0 


Since the first two equations are equivalent to the two normal equations on 
page 454, the maximum likelihood estimates of а and f are identical with the 
least squares estimate of Theorem 14.2. Also, if we substitute these estimates of 

aln L 
дс 
ately that the maximum likelihood estimate of c is given by 


a COE 2 
ё= үс. EUG [ur 


This can also be written as 


а and В into the equation obtained by equating to zero, it follows immedi- 


as the reader will be asked to verify in Exercise 1 on page 469. 

Having obtained maximum likelihood estimators of the regression 
coefficients, let us now investigate their use in testing hypotheses concerning 
а and B, and in constructing confidence intervals for these two parameters. 
Since problems concerning 8 are usually of more immediate interest than prob- 
lems concerning a ( is the slope of the regression line while a is merely the 
y-intercept; also, the null hypothesis В = 0 is equivalent to the null hypothesis 
р = 0), we shall discuss here some of the sampling theory relating to 6, while 
leaving corresponding theory relating to â to Exercises 4 and 6 on page 469. 
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To study the sampling distribution of ё, let us write 


Ў (x — By - 9) 
pes Bie o i 


which is seen to be a linear cọmbination of the n independent normal random 
variables y;. It follows from Exercise 6 on page 269 that B, itself, has a normal 
distribution with the mean 


кф - $ [Ez] eem 


and the variance 


Nx 


n WAF) 2 
уа) = X PS < vary; х) 


xx 


\ 
ims 
кт 

E 
gu 
E 
as | 
S 
9, 
Lu 
л 
коз 


In order to apply this theory to test hypotheses about B or construct 
confidence intervals for B, we shall have to use the following theorem: 


THEOREM 143 Under the assumptions of normal regression analysis, тар 


has the chi-square distribution with n — 2 degrees of freedom. Furthermore, 


FAF and p are independent. 


A proof of this theorem is referred to on page 496. » 
Making use of this theorem as well as the result proved earlier that B has 
2 


^ а 
distribution with the mean B and the variance 5 we find that the 
хх 


a normal 


definition of the t distribution in Section 8.5 leads to 
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THEOREM 144 Under the assumptions of normal regression analysis, 


B-B 
BENE B - B. [n - 2s. 
AQ ^ 
B (il) т п 
g 


is a value of a random variable having the t distribution with n — 2 degrees 
of freedom. 


Based on this statistic, let us now test a hypothesis about the regression 
coefficient B. 


EXAMPLE 14.5 


With reference to the data on page 453 pertaining to the amount of time that ten 
persons studied for a test and their scores, test the null hypothesis 8 = 3 against 
the alternative hypothesis В > 3 at the 0.01 level of significance. 


Solution 
i^ Hs В 3 
БЕ BI 
2. Reject the null hypothesis if г > 2.896, where t is determined in accord- 


ance with Theorem 14.4 and 2.896 is the value of foi g according to 
Table IV. 


3. Calculating У у? = 36,562 from the original data and copying the other 
quantities from page 456, we get 


Sy = 36,562 — 10(564)° = 4,752.4 
and 


ê = 4 15[4,752.4 — (3.471)(1,305)] = 4.720 


341-3 [8-376 — Ea 
4.720 TOA IS 


so that 
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4. Since t = 1.73 is less than 2.896, the null hypothesis cannot be rejected; 


we cannot conclude that on the average an extra hour of study will 
increase the score by more than 3 points. A 


From Theorem 14.4 we have 
& - 8. [(n - 25s 


наа < үз ИЛЛ < 0 =1-@ 


or, equivalently, 


ü ejes А i 6,,—— 
‚|8 иша CA арн <В В fnit 60, — =| 


za 


so that 


THEOREM 145 Under the assumptions of normal regression analysis, a 


(1 — a)100?6 confidence interval for the parameter В is given by 


К ў п ^ TA n 
B — taj2n-2* ONG - 2S4 < B < B + tan- ôN G - 28. AS 


( 


EXAMPLE 14.6 


With reference to the test scores and hours studied on page 453, construct a 95% 


confidence interval for £. 


Solution 
ntities from pages 456 and 466, and substituting 


Copying the various qua 
= 2.306 into the confidence interval formula of 


them together with {025,8 
Theorem 14.5, we get 


10 10 
e —— < 3471 + (2.306)(4.720) \ 7 
3.471 — (2.306)(4.720) 876) «p ( X ) 8376) 


or 


284« B «410 А | 
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Since most realistically complex regression problems require fairly extensive 
calculations, they are virtually always done nowadays by using appropriate 
computer software. A printout thus obtained for our illustration is shown in 
Figure 14.4; as can be seen, it provides not only the values of @ and f in the 
column headed "COEFFICIENT," but also estimates of the standard deviations 
of the sampling distributions of & and В in the column headed "ST. DEV. OF 
COEF." and other relevant information. Had we used this printout in Example 
14.5, we could have written the value of the t statistic directly as 


3471-3 


= 1.7 
0.272 МЗ 


and in Example 14.6 we could have written the confidence limits directly as 
3.471 x (2.306)(0.272). 


МТВ > NAME Cl = 'X' 
МТВ > NAME C2 = 'ү' 
МТВ > $ЕТ С1 

ORTASI E EAEE Yang Мез К Т. АЙ ЧИ 17 

МТВ > $ЕТ С2 

DATA» 31 58 65 73 37 44 60 91 21 84 
MTB > ВЕСЕ C2 1 Cl 


THE REGRESSION EQUATION IS 


Y = 21,7 + 3.47 X * 
ST. DEV. T-RATIO - 
COLUMN COEFFICIENT OF COEF. COEF/S.D. 
21.693 3.194 6.79 
x 3.4707 0.2723 12.74 
S = 5.281 


R-SQUARED - 95,3 PERCENT 
R-SQUARED - 94.7 PERCENT, ADJUSTED FOR D.F. 


ANALYSIS OF VARIANCE 


DUE TO DF 55 MS-SS/DF 
REGRESSION 1 4529.3 4529.3 
RESIDUAL 8 223.1 27.9 
TOTAL 9 4752.4 


DURBIN-WATSON STATISTIC - 1.04 


Figure 14.4 Computer printout for linear regression example. 
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THEORETICAL EXERCISES 


1. 


2: 


Making use of the fact that â = ӯ — Вх and B- же, show that 
XX 


Icd + BaP = 5, - #5» 


Show that 

(a) ô? is not an unbiased estimator of c? 
(b s= а a is an unbiased estimator of c^. 

The quantity s, is often referred to as the standard error of estimate. 


„ Using s, (see Exercise 2) instead of д, rewrite 


(a) the expression for the t statistic of Theorem 14.4; 
(b) the confidence interval formula of Theorem 14.5. 


. Under the assumptions of normal regression analysis, show that 


(a) the least squares estimate of a in Theorem 14.2 can be written in the form 


(b) & has a normal distribution with 


5, + n£)c* 
E(@) =a and var(&) = (Sx + neo 
nS. 
Use Theorem 4.15 to show that 
^ x 
сох(ё, В) = —$_` а? 
"xx 


„ Use the result of part (b) of Exercise 4 to show that 


Be (& = a)y nS 
od S, + п? 
is a value of a random variable having the standard normal distribution. 


я, п 
Also, use the first part of Theorem 14.3 and the fact that & and т аге 
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independent to show that 


p a= ап = 28. 
GV S, + пх? 


is a value of a random variable having the 1 distribution with n — 2 degrees 
of freedom. 


. Use the results of Exercises 4 and 5 and the fact that E(B) = В and var(ĝ) = 


us 
S. 
bution with the mean 


to show that ў = & + fix, is a random variable having a normal distri- 


а + BXo = шух, 


and the variance 


пб 
Also, use the first part of Theorem 14.3 as well as the fact that y; and —- 


are independent to show that 


pe 
a pa 
ёх[1 каи): 

S, 


хх 


is a value of a random variable having the t distribution with n — 2 degrees 
of freedom. 


. Derive a (1 — @)100% confidence interval formula for шух the mean of y 


at X = Xo, by solving the double inequality —1,/5, 5 < t < t,/2, with t 
given by the formula of the preceding exercise. 


| Use the results of Exercises 4 and 5 and the fact that E(B) = Вапа var(B) = 


o ^ 
57 to show that yọ — (& + Bx) is a random variable having a normal 


distribution with zero mean and the variance 


2 1 
zh {зен 
п 


(Xo — | 


xx 


10. 


Sec. 14.4.: Normal Regression Analysis 471 


where Yo has a normal distribution with the mean @ + Вх and the variance 
o^; that is, yo is a future observation of y corresponding to X = Xo: Also, use 
the first part of Theorem 14.3 as well as the fact that yo — (a+ 6x0) and 


a2 


no K 
Sun are independent to show that 


ъ= (â + Вх) 2 


п(хо = XY. 
S 


бъј1+п+ 


is a value of a random variable having the t distribution with n — 2 degrees 
of freedom. 

Solve the double inequality —ta/2n-2'< f Sti pace with t given by the 
formula of the preceding exercise, so that the middle term is yo and the two 
limits can be calculated without knowledge of yo. Note that although the 
resulting double inequality may be interpreted like a confidence interval, it 
is not designed to estimate а parameter; instead, it provides limits of prediction 
for a future observation of y which corresponds to the (given or observed) 
value Xo. 


APPLIED EXERCISES 


11. 


12. 


13. 


With reference to Exercise 15 on page 459, test the null hypothesis В = 1.25 
against the alternative hypothesis B > 1.25 at the 0.01 level of significance. 


With reference to Exercise 17 on page 460, test the null hypothesis B. — 0.350 
against the alternative hypothesis В < 0.350 at the 0.05 level of significance. 


The following table shows the assessed values and the selling prices of eight 
houses, constituting à random sample of all the houses sold recently in a 


metropolitan area: 


Assessed value Selling price 
(thousands of dollars) (thousands of dollars) 


40.3 63.4 
72:0 118.3 
32.5 55.2 
44.8 74.0 
27.9 48.8 
51.6 81.1 
80.4 1232 


58.0 92.5 
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14. 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


Fit a least squares line which will enable us to predict the selling price of a 
house in terms of its assessed value and test the null hypothesis 8 — 1.30 
against the alternative hypothesis 8 > 1.30 at the 0.05 level of significance. 


With reference to Exercise 16 on page 460, construct a 9996 confidence 
interval for the regression coefficient £. 
With reference to Exercise 18 on page 461, construct a 98% confidence 
interval for the regression coefficient £. 
With reference to Example 14.4 on page 456, use the theory of Exercise 6 to 


test the null hypothesis a = 21.50 against the alternative hypothesis a # 
21,50 at the 0.01 level of significance. 


The following data show the advertising expenses (expressed as a percentage 
of total expenses) and the net operating profits (expressed as a percentage 
of total sales) in a random sample of six drugstores: 


Advertising Net operating 


expenses Profits 
1.5 3.6 
1.0 2.8 
2.8 54 
04 19 
13 2.9 
2.0 43 


———————— 


Fit a least squares line which will enable us to predict net operating profits 
in terms of advertising expenses and test the null hypothesis а = 0.8 against 
the alternative hypothesis a > 0.8 at the 0.01 level of significance. 


With reference to Exercise 15 on page 459, use the theory of Exercise 6 to 
construct a 95% confidence interval for a. 


With reference to Exercise 16 on page 460, use the theory of Exercise 6 to 
construct a 99% confidence interval for a. 


Use the theory of Exercises 8 and 10 (as well as the quantities already 

calculated in Examples 14.4 and 14.5) to construct 

(a) а 95% confidence interval for the mean test score of persons who have 
studied 14 hours for the test; 

(b) 95% limits of prediction for the test score of a person who has studied 
14 hours for the test. 


Use the theory of Exercises 8 and 10 and the data of Exercise 15 on page 
459 to find 


(a) a99% confidence interval for the expected number of deaths in a group 
of 25 mice when the dosage is 9 mg; 
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(b) 99% limits of prédiction of the number of deaths in a group of 25 mice 
when the dosage is 9 mg. 


145 NORMAL CORRELATION ANALYSIS 


In normal correlation analysis we analyze a set of paired data {(x;, у); i = 1, 
2,..., n}, where the x;’s and уг'ѕ are values of a random sample from a bivariate 
normal population with the parameters ші, из, 71, 02, and p. To estimate these 
parameters by the method of maximum likelihood, we shall have to maximize 
the likelihood 


L= TL flay 


where f(x, yj) is given by Definition 6.8, and to this end we shall have to 
differentiate L, or In L, partially with respect to щш, H2, Or, 02, and p, equate 
the resulting expressions to zero, and then solve the resulting system of equations 
for the five parameters. Leaving the details to the reader, let us merely state that 


àlnL діа L 
1610 
дш. ди» 


when are equated to zero, we get 


: (х= Ш) P D (yi = м) 
i=) + i=) 
с? PA 


=0 


and 


рУ (x — ш) Y (ит м) 
TEES + 2 =0 
0,02 1 


Solving these two equations for ш, and шз, we find that the maximum likelihood 


estimates of these two parameters are 


. ôlnL àlnL d 
namely, the respective sample means. Subsequently, equating ioe) до” an 


àlnL 


to zero and substituting X and у for p, and p2, we obtain a system of 
др 
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equations whose solution is 


E30 -» 


Y КАН (и 9? 


i 


p= 


(A detailed derivation of these maximum likelihood estimates is referred to at 
the end of this chapter.) It is of interest to note that the maximum likelihood 
estimates of o, and с, are identical with the one obtained on page 354 for the 
standard deviation of the univariate normal distribution; they differ from the 
n — 


respective sample standard deviations s, and s; only by the factor 


The estimate f, called the sample correlation coefficient, is usually denoted 
by the letter r, and its calculation is facilitated by using the following alternative, 
but equivalent, computing formula: 


THEOREM 146 If ((x,y);i = 1,2,...,n) are the values of a random 
sample from a bivariate population, then 


Since p measures the strength of the linear relationship between x and y, 
there are many problems in which the estimation of p and tests concerning р até 
of special interest. When p — 0, the two random variables are uncorrelated, and 
as we have already seen, in the case of the bivariate normal distribution this 
means that they are also independent. When p equals +1 or —1, it follows from 
the relationship 


agi. = а? = 051 — р?) 


established in Theorem 6.9, that с = 0, and this means that there is а perfect 
linear relationship between x and y. Using the invariance property of maximum 
likelihood estimators, we can write 


6? = (1 - r’) 
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which not only provides an alternative computing formula for finding 6°, but 
also serves to tie together the concepts of regression and correlation. From this 
formula for ó? it is clear that when @° = 0, namely, when the set of data points 
{x y); i = 1, 2,..., n] fall on a straight line, then r will equal +1 or —1, 
depending on whether the line has an upward or downward slope. In order to 
interpret values of ғ between 0 and +1 or 0 and —1, we solve the preceding 
equation for r^ and multiply by 100, getting 


where 62 measures the total variation of the y's, 6? measures the conditional 
variation of the y's for fixed values of x, and, hence, 63 — 6 measures that part 
of the total variation of the y's which is accounted for by the relationship with 
x. Thus, 1007? is the percentage of the total variation of the y's which is accounted 
for by the relationship with x. For instance, when r — 0.5 then 25 percent of the 
variation of the y's is accounted for by the relationship with x, when r — 0.7 
then 49 percent of the variation of the y’s is accounted for by the relationship 
with x, and we might thus say that a correlation of r — 0.7 is almost “twice as 
strong" as a correlation of r — 0.5. Similarly, we might say that a correlation of 
= 0.6 is “nine times as strong" as a correlation of r = 0.2. 


EXAMPLE 14.7 


Suppose that we want to determine on the basis of the following data whether 
there is a relationship between the time, in minutes, it takes a secretary to complete 
a certain form in the morning and in the late afternoon: 


Morning Afternoon 
x P 
AATA KAU «БА ЛАША 
8.2 8.7 
9.6 9.6 
7.0 6.9 
9.4 8.5 
10.9 11.3 
74 7.6 
9.0 92 
6.6 63 
84 84 
10.5 123 


ММ Мн а 


Compute and interpret the sample correlation coefficient. 
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Solution 
From the data we get n = 10, Ух = 86.7, yx = 771.35, Ly = 88.8, 
Уу? = 819.34, and У xy = 792.92, so that 
S, = 771.35 — 1(86.7)° = 19.661 
S,, = 819.34 — 1088.8)? = 30.796 
S, = 792.92 — 35(86.7)(88.8) = 23.024 


апа 


23.024 


"© 719661030796) - 


This is indicative of a positive association between the time it takes a 
secretary to perform the given task in the morning and in the late afternoon, 
and this is also apparent from the scattergram of Figure 14.5. Since 
100/2 = 100(0.936)? = 87.6, we can say that almost 88% of the variation 
of the y's is accounted for by a linear relationship with x. А 


y 
(minutes) 


Afternoon 
N or o Oo ыы со о 
N 
N 


x (minutes) 


12 34 6 647 8 8 1011 12 13 14 
Morning 


Figure 145 Scattergram of data of Example 14.7. 
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Since the sampling distribution of r for random samples from bivariate 


normal populations is rather complicated, it is common practice to base 
Tot 


1-r 


confidence intervals for p and tests concerning p on the statistic 1.In 


+p 


whose distribution is approximately normal with the mean 1.10 : and the 
"cu 


variance . Thus, 


1 
hrs 


/£-3 . 04 Dü-»p) 


2 (=r) +p) 


can be looked upon as a value of a random variable having approximately the 
standard normal distribution. Using this approximation, we can test the null 
hypothesis p = po against an appropriate alternative as illustrated in Example 
14.8 below, or calculate confidence intervals for p by the method suggested in 


Exercise 4 on page 478. 


EXAMPLE 14.8 


With reference to Example 14.7, test the null hypothesis p = 0 against the 
alternative hypothesis p # 0 at the 0.01 level of significance. 


Solution 


Li Bee 
Н: р#0 


`2. Reject the null hypothesis if 2 < —2.515 or z > 2.575, where 


Mns E D 
aora: a-r) 


3. Substituting n = 10 and r = 0.936, we get 


= У.а = 45 
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4. Since z = 4.5 exceeds 2.575, the null hypothesis must be rejected; we 
conclude that there is a relationship between the time it takes a secretary 
to complete the form in the morning and in the late afternooon. A 


THEORETICAL EXERCISES 


1. Verify that the formula for t of Theorem 14.4 can be written as 


2. Use the formula for t of the preceding exercise to derive the following 
(1 — а)100% confidence limits for B: 


^ vl-r 
B |1 tan-2 SSS 


rvn — 2 


3. Use the formula for t of Exercise 1 to show that if the assumptions underlying 
normal regression analysis are met and B = 0, then r has a beta distribution 


with the mean y 
пл 
4. By solving the double inequality —Za/2 < 2 < Za/2 (with z given by the 
formula on page 477) for p, derive a (1 — @)100% confidence interval formula 
for p. 


APPLIED EXERCISES 


5. An objective achievement test is said to be reliable if a student who takes 
the test several times will consistently get high (or low) scores. One way of 
checking the reliability of a test is to divide it into two parts, usually the 
even-numbered problems and the odd-numbered problems, and observe the 
correlation between the scores which students get in both halves of the test. 
Thus, the following data represent the grades, x and y, which 20 students 
obtained for the even-numbered problems and the odd-numbered problems 
of a new objective test designed to test eighth grade achievement in general 
science: 
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x y x y 

P 21-5199 33 4 
36 44 EUN): 

44 49 38 38 

3202] 24 2 

ZI 33 33 34 

41 33 32) ST 

38 29 37 38 

44 40 33111535 

зо 27 34 2 


27 38 39 43 


Calculate r for these data and test its significance, namely, the null hypothesis 
p = 0,аї a = 0.05. 

6. With reference to the preceding exercise, use the formula obtained in Exercise 
4 to construct a 95% confidence interval for p. 


7. The following data pertain to x, the amount of fertilizer (in pounds) which 
a farmer applies to his soil, and y, his yield of wheat (in bushels per acre): 


x y x y x y 
12 33 88 24 Xa 
92 28 44 17 23:15 
72 38 132 36 22132 
66 17 23 14 142 38 
11250583 579); 125 255 13 
88 31 111 40 127 23 
42 8 69 29 88 31 
1206 37 19 12 48 37 
72.1232 103 27 61 25 
52 20 141 40 71 14 
28 17 тт 26 13 26 


Assuming that the data can be looked upon as а sample from a bivariate 
normal population, calculate r and test its significance at a = 0.01. Judging 
from a scattergram of these paired data, does the assumption seem rea- 
sonable? 

8. With reference to the preceding exercise, use the formula obtained in Exercise 
4 to construct a 95% confidence interval for p. 
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9. Use the formula of Exercise 1 to calculate a 95% confidence interval for В 
for the hours studied and test scores on page 453, and compare this interval 
with the one obtained in Example 14.6. 


10. The calculation of r can often be simplified by adding the same constant to 
each x, adding the same constant to each y, or by multiplying each x and/or 
y by the same positive constant. Recalculate r for the data of Example 14.7 
by first multiplying each x and each y by 10, and then subtracting 70 from 
each x and 60 from each y. 


14.6 MULTIPLE LINEAR REGRESSION 


Although there are many problems in which one variable can be predicted quite 
accurately in terms of another, it stands to reason that predictions should improve 
if one considers additional relevant information. For instance, we should be able 
to make better predictions of the performance of newly hired teachers if we 
consider not only their education, but also their years of experience and their 
personality. Also, we should be able to make better predictions of a new textbook’s 
success if we consider not only the quality of the work, but also the potential 
demand and the competition. 

Although there are many different formulas that can be used to express 
regression relationships among more than two variables (see, for instance, 
Example 14.3), most widely used are linear equations of the form 


Poy lx, 22.0 x = Bo + Вх, + Box, + +++ + Вх, 


This is partly a matter of mathematical conveaience and partly due to the fact 
that many relationships are actually of this form or can be approximated closely 
by linear equations. 

In the equation above, y is the random variable whose values we want to 
predict in terms of given values of x,, x,..., and xp, and Bo, B1, B2,..., and 
Вк, the multiple regression coefficients, are numerical constants which must be 
determined from observed data. 


To illustrate, consider the following equation which was obtained in a study 
of the demand for different meats: 


J = 3.489 — 0.090x, + 0.064х, + 0.019x, 


Here j denotes the estimated consumption of federally inspected beef and veal 
in millions of pounds, x, denotes a composite retail price of beef in cents per 
pound, x, denotes a composite retail price of pork in cents per pound, and Xs 
denotes income as measured by a certain payroll index. 
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As in Section 14.3, where there was only one independent variable x, multiple 
regression coefficients are usually estimated by the method of least squares. For 
n data points 


{Олжо 2990085 90) = 12; 5n) 


the least squares estimates of the f's are the values rn Êi, po, запа В, for 
which the quantity 


qus » [йг (Bo + fixis Ваха toot Box) T 


is a minimum. In this notation, xi; is the ith value of the variable xi, х2 is the 


ith value of the variable х, and so on. 3 
So, we differentiate partially with respect to the B’s, and equating these 


partial derivatives to zero, we get 


4 X (-2)yi = (Bo + fixa + Boo quot Bou)] = 0 


Во m 

S z È (-2)xalyi — (Bo + Bixn + Ba ves Ёо) = 0 
1 i=l 

ii = X (-2)xoly: — (Bo + Bixa + Ва Bixx)] = 0 

9В ‘=! 


x ў (=2)ха[у:- (Bo F Bixa * Bixa sta thse Box] =0 
E) NES 


and finally the k + 1 normal equations 
Ly = Born +Ê Ex + Ух test В Ух 
Уху = Bo Da + Br DM 35; Xxm tot Pe Lx 
Sony = Bo E52 + й Dom + В vc Ве У ххк 


Xov Bo: Exe + Йу хеп + Be Dae + ait + Bee У, х? 


А п 
Here we abbreviated our notation by writing È xmas У х, L хахо aS È ХХ), 
i=l = 


and so on. 
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EXAMPLE 14.9 
The following data show the number of bedrooms, the number of baths, and the 


prices at which a random sample of eight one-family houses sold recently in a 
certain large housing development: 


Number of | Number оў Price 


bedrooms baths (dollars) 

x X2 y 

З 2 78,800 
2 1 74,300 
4 3 83,800 
2 1 74,200 
3 2 79,700 
2 2 74,900 
5 3 88,400 
4 2 82,900 


Use the method of least squares to find a linear equation which will enable us 
to predict the average sales price of a one-family house in the given housing 
development in terms of the number of bedrooms and the number of baths. 


Solution 


The quantities we need for substitution into the three normal equations are 
n-8,Xx,-25, Ух = 16, Yy = 637,000, Ух? = 87, У хх, = 55, 
У х5 = 36, У xy = 2,031,100, and У x,y = 1,297,700, and we get 


637,000 = 88, +256, + 16, 
2,031,100 = 258, + 876, + 556, 
1,297,700 = 168, + 554, + 362, 

We could solve these equations by the method of elimination ог by 
using determinants, but in view of the rather tedious calculations, such 
work is usually left to computers. Thus, let us refer to the printout of Figure 
14.6, which shows in the column headed "COEFFICIENT" that Bo = 


65,191.7, B, = 4,133.3, and B; = 758.3. After rounding, the least squares 
equation becomes 


ў = 65,192 + 4,133x, + 758х, 
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MTB > SET СІ 
рата 3 2 4 2 3 2 5 4 


MTB > SET C2 
DATA> 2 1 3 1 2 2 3 2 


MTB > SET C3 
DATA» 78808 74308 83800 14200 79700 74900 88400 82900 


MTB > REGR C3 2 Cl C2 


THE REGRESSION EQUATION IS 
C3 = 65192 + 4133 Cl + 758 c2 


ST. DEV. T-RATIO * 


COLUMN COEFFICIENT OF COEF. COEF/S.D. 

65191.7 418.0 155.96 
cl 4133.3 228.6 13.08 
c2 758.3 340.5 2.23 
5 = 370.4 


R-SQUARED = 99.6 PERCENT 
R-SQUARED = 99.5 PERCENT, ADJUSTED FOR D.F. 


ANALYSIS OF VARIANCE 


DUE TO DF ss MS=SS/DF 
REGRESSION 2 185269168 92634592 
RESIDUAL 5 685833 137167 
TOTAL 7 185955008 


FURTHER ANALYSIS OF VARIANCE 
SS EXPLAINED BY EACH VARIABLE WHEN ENTERED IN THE ORDER GIVEN 
ss 


DUE TO DF 
REGRESSION 2 185269168 
cl 1 184588800 
c2 1 680364 


DURBIN-WATSON STATISTIC * 2.29 


Figure 14.6 Computer printout for multiple regression example. 


and this tells us that (in the given housing development and at the time the 
study was made) each extra bedroom adds on the average $4,133, and each 


bath $758, to the sales price of a house. A 


EXAMPLE 14.10 


Based on the result of the preceding example, predict the sales price of a 


three-bedroom house with two baths. 
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Solution 


Substituting x, = 3 and x; = 2 into the equation obtained above, we get 


ў = 65,192 + 4,133(3) + 758(2) 
= $79,107 


or approximately $79,100. A 


Printouts like those of Figure 14.6 also provide information that is needed 
to make inferences about the, multiple regression coefficients and to judge the 
merits of estimates or predictions based on the least squares equations. This 
corresponds to the work of Section 14.4, but we shall defer it until Section 14.7, 
where we shall study the whole problem of multiple linear regression in a much 
more compact notation. 


147 MULTIPLE LINEAR REGRESSION 
(Matrix Notation)" 


The model we are using in multiple linear regression lends itself uniquely to a 
unified treatment in matrix notation. This notation makes it possible to state 
general results in compact form and to use to great advantage many of the results 
of matrix theory. 

We could introduce the matrix approach by expressing the sum of squares 

q (which we minimized in the preceding section by differentiating partially with 
jus to the В s) in matrix notation and take it from there, but leaving this to 
the reader in Exercise 1 on page 490, let us begin here with the normal equations 
on page 481. 

It is customary to denote matrices by capital letters in boldface type, but 
since we are using boldface here for random variables, our symbols for matrices 
will be in ordinary lightface type. 

To express the normal equations in matrix notation, let us define the 
following three matrices: 


l Xu Xi oft Wi 
1 X4 X» t Xx 


lo Xu Xa ctt Xu 


* It is assumed for this section that the reader is familiar with the material ordinarily 
covered in a first course on matrix algebra. Since matrix notation is not used elsewhere 
in this book, this section may be omitted without loss of continuity. 
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У Bo 
y=|7?], and B= Bs 
Yn B. 


The first one, X, is an n X (k 1) matrix consisting essentially of the given 
values of the x's, with the column of I's appended to accommodate the constant 
terms. Y isan n x 1 matrix (or column vector) consisting of the observed values 
of y, and B isa (К+ 1) x 1 matrix (or column vector) consisting of the least 
squares estimates of the regression coefficients. 

Using these matrices, we can now write the following symbolic solution of 
the normal equations on page 481. 


THEOREM 147 The least squares estimates of the multiple regression 
coefficients are given by 


B- ООО XY: 
f X and (X'X)^! is the inverse of x 


where X' is the transpose o 


Proof. First we determine X'X, X'XB, and X ' Y, getting 


n Ух Ух к 

Ух Xx yi Э, гуды У ххк 
XX =| хь Ухх Lx + ухх 

ух Уха Xxx t Yxk 


Êo: n +Ё, Хх +: Ух + Ух 
Ё. EX +Ё Dx +Ê Ух + с + Be Хохь 
Х'ХВ = Bo ую +Ё Dm + Ух + + Ухх 


Bo У хк 4+ By Lx + Ё Уо + eR | EX 


S 
Уху 
X'Y =| Уху 


Уху 
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Identifying the elements of X'XB as the expressions on the right-hand 
side of the normal equations on page 481, and those of X'Y as the 
expressions on the left-hand side, we can write 

X'XB = X'Y 
Multiplying on the left by (X'X)'', we get 
(X Xy X'XB (ХХ! X Y 
and finally 
Bes (XX XY 

since (X'X) ' X'X equals the (k + 1) х (К + 1) identity matrix J and by 


definition IB — B. We have assumed here that X'X is nonsingular, so that 
its inverse exists. M 


EXAMPLE 14.11 


With reference to Example 14.9, use Theorem 14.7 to determine the least squares 
estimates of the multiple regression coefficients. 


Solution 


Substituting Y x, 2.25, Y x; = 16, Y x; = 87, У xix; = 55, У xj = 36, and 
n — 8 from page 482 into the expression for X'X on page 485, we get 


8 25 16 
Х'Х = |25 87 55 
16 55 36 


Then, the inverse of this matrix сап be obtained by any one of a number 
of different techniques; using the one based on cofactors, we find that 


à 107 —20 -17 
(XT ei: -20 32 -40 
-17 -4 71 


where 84 is the value of |X'X|, the determinant of X'X. 
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Substituting Уу = 637,000, У, хуу = 2,031,100, and Уу = 
1,297,700 from page 482 into the expression for X'Y on page 485, we then 


get 
637,000 
X'Y = | 2,031,100 
1,297,700 
and finally 


1 107 -20 -17 637,000 
(xxx Yt -20 32 —40 || 2,031,100 


84 

-17 -40 71/\1,297,700 
| 5,476,100 
Bit 347,200 

84 

63,700 

65,191.7 
=| 41333 

758.3 


where the B's are rounded to one decimal. Note that the results obtained 
here are identical with those shown in the computer printout of Figure 


14.6. A 


Next, to generalize the work of Section 14.4, we make assumptions that are 
fori21,2,.. and n, the 


very similar to those on page 463—we assume that 
у; are independent random variables having normal distributions with the means 
Bo + Bix + Baxa t + BiXx and the common standard deviation c. Based 


on n data points 
(ха, Xing + ++» Mike yi) 


we can then make all sorts of inferences about the parameters of our model, the 
B's and a, and judge the merits of estimates and predictions based on the estimated 
multiple regression equation. 

Finding maximum likelihood estimates of the B's and с is straightforward, 
as on page 464, and it will be left to the reader in Exercise 2 on page 491. The 
results are as follows: The maximum likelihood estimates of the B's equal the 


corresponding least squares estimates, 50 they are given by the elements of the 
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(К + 1) x 1 column matrix 
B- (ХХ) XY 


The maximum likelihood estimate of c is given by 


б = үк ^ X [Di (B, d Bixa * [m ЕЗДЕ Жы Bx) 


where the B's are the maximum likelihood estimates of the B's, and as the reader 
will be asked to verify in Exercise 3 on page 491, it can also be written as 


iN Y'Y-B'XY 
dbi cS 


in matrix notation. 
EXAMPLE 14.12 


Use the results of Example 14.11 to determine the value of С for the data of 
Example 14.9. 


Solution 
First let us calculate Y'Y, which is simply У, у?, so we get 
i=] 
Y'Y = 78,800? + 74,300? +... + 82,9007 


= 50,907,080,000 


Then, copying B and X'Y from page 487, we get 


{ 637,000 
B'X'Y = zg ` (5,476,100 347,200 63,700)| 2,031,100 
1,297,700 

= 50,906,394,166 


and it follows that 


ё@ — ү [50:907,080,000 — 50,906,394,166 
8 


= 292.8 A 
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It is of interest to note that the estimate which we obtained here does not 
equal the one shown in the computer printout of Figure 14.6. The estimate shown 
there, S — 3704, is such that S? is an unbiased estimate of a°, analogous to the 
standard error of estimate which we defined on page 469. It differs from С in 
that we divide by n — k — 1 instead of n, and if we had done so in our example, 
we would have obtained 


La [50,907,080,000 — 50,906,394,166 
ү 8-2-1 


= 3704 


Proceeding as in Section 14.4, we investigate next the sampling distributions 
of the B, for i — 0, 1,...,k and ©. Leaving the details to the reader, let us 
merely point out that arguments similar to those on page 465 lead to the results 
that the ё, are linear combinations of the n independent normal random variables 
yi, so that the &. themselves, have normal distributions. Furthermore, they are 
unbiased estimators, that is 


E(B)- B; fori = 0,1,..., К 
and their variances are given by 
va(B) = «e^ богі = 0,1,...,Ё 


Here cj; is the element in the ith row and the jth column of the matrix (X'X)'!, 


with i and j taking on the values 0, 1,..., K 
Let us also state the result that, analogous to Theorem 14.3, the sampling 
A2 


distribution of =a is the chi-square distribution with n — k — 1 degrees of 
g 


A2 
freedom, and that Rs and ё, are independent for i = 0, 1,..., k. Combining 
OF 
all these results, we find that the definition of the t distribution in Section 8.5 
leads to 


THEOREM 148 Under the assumptions of normal multiple regression 
analysis, 


B. В fori = 0,1,...,К 


are values of random variables having the t distribution with n — k — 1 


degrees of freedom. 
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Based on this theorem, let us now test a hypothesis about one of the multiple 
regression coefficients. 


EXAMPLE 14.13 


With reference to Example 14.9 test the null hypothesis B, = $3,500 against the 
alternative hypothesis B; > $3,500 at the 0.05 level of significance. 


Solution 


1. Ho: В, = 3,500 
Hi: В, > 3,500 

2. Reject the null hypothesis if t > 2.015, where t is determined in ac- 
cordance with Theorem 14.8 and 2.015 is the value of tos, according 
to Table IV. 

3. Substituting n — 8, Ё, = 4133.3, and с, = 34 from Example 14.11, and 
6 = 292.8 from Example 14.12 into the formula for t, we get 


=i 4,133.3 — 3,500 


. 32 
292.8. 8 fil 


_ 41333 - 3,500 
228.6 


= 2.77 


4. Since t = 2.77 exceeds 2.015, the null hypothesis must be rejected; we 
conclude that on the average each additional bedroom adds more than 
$3,500 to the sales price of such a house. (Note that the value in the 
denominator of the г statistic, 228.6, equals the second value in the 
column headed "ST. DEV. OF COEF.” in the computer printout of 
Figure 14.6.) A 


Analogous to Theorem 14.5, we can also use the t statistic of Theorem 14.8 
to construct confidence intervals for multiple regression coefficients (see Exercise 


7 below). 
THEORETICAL EXERCISES 


1. If b is a column vector of estimates of the 8° verify in matrix notation that 
q = (Y — Xb)'(Y — Xb) is a minimum when b = B = (X'X) 'X'Y. 
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. Verify that under the assumptions of normal multiple regression analysis . 


(a) the maximum likelihood estimates of the B's equal the corresponding 
least squares estimates; 
(b) the maximum likelihood estimate of c is 


(Y — XB)'(Y — XB) 
n 


б = 


. Verify that the estimate of part (b) of the preceding exercise can also be 


written as 
lí data — B'X'Y 
MUT. 


. Show that 6? is a biased estimator of o°, but that it can be made unbiased 
by dividing by n - k — 1 instead of n. 

. Show that under the assumptions of normal multiple regression analysis 
(а) E(B)- Bi for i = 61... 
(b) var(B) = cua? for i = ON Э 
(c) cov(Bi, &) = со? for i # j = 0, ПУ 2 

. Show that for К = 1 the formulas of the preceding exercise are equivalent 
to those given on page 465 and in Exercises 4 and 5 on page 469. 

. Use the t statistic of Theorem 14.8 to construct a (1 — a)100% confidence 
interval formula for В; for i = fia iene 

„ If хоу, Xoz,- -> ANd хок аге given values of the variables ху, X», -.. , and ху, 
and Хо is the column vector 


it can be shown that 


" 
B'Xo — Mey|xoy o2 Xok 


t = —— 
ye [nl X(X'X) X] 
n-k-1 
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is a value of a random variable having the t distribution with n — k — 1 

degrees of freedom. 

(a) Show that for k — 1 this statistic is equivalent to the one of Exercise 
7 on page 470. 

(b) Derive a (1 — а)100% confidence interval formula for 


Бух, X92, 0i 


9. With хоу, Xo2,---,Xox and Хо as defined in the preceding exercise, and yo 
being a random variable having a normal distribution with the mean 
Bo + Вахо + *** + Вухок and the variance о?, it can be shown that 


Yo — В'Хо 
А) [nt + Х(Х'Х) ] 
n-k-1 


is a value of a random variable having the t distribution with n — К — 1 

degrees of freedom. 

(a) Show that for k — 1 this statistic is equivalent to the one of Exercise 9 
on page 470. 

(b) Derive a formula for (1 — a)10096 limits of prediction for a future 
observation of yo. 


t= 


APPLIED EXERCISES 


10. The following are sample data provided by a moving company on the weights 
of six shipments, the distances they were moved, and the damage that was 


incurred: 
Weight Distance Damage 
(1,000 1b) (1,000 miles) (dollars) 
x x y 
40 1.5 160 
3.0 22 112 
1.6 1.0 69 
12 2.0 90 
34 0.8 123 
48 1.6 186 


(a) Assuming that the regression is linear, estimate Bo, B,, and £2. 
(b) Use the results of part (a) to estimate the damage when a shipment 
weighing 2,400 pounds is moved 1,200 miles. 
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11. The following are data on the average weekly profits (in $1,000) of five 
restaurants, their seating capacities, and the average daily traffic (in thousands 
of cars) which passes their locations: 


Seating Traffic Weekly net 
capacity count profit 
x x y 
120 19 23.8 
200 8 24.2 
150 12 22.0 
180 15 26.2 
240 16 33.5 


(a) Assuming that the regression is linear, estimate Во, Bi, and B». 

(b) Use the results of part (a) to predict the average weekly net profit of a 
restaurant with a seating capacity of 210 at a location where the daily 
traffic count averages 14,000 cars. 

12. The following data consist of the scores which ten students obtained in an 
examination, their I.Q.'s, and the numbers of hours they spent studying for 
the examination: 


Number of 
hours 

1.0. studied Score 

x, X y 
eo IS E e LN а. 
112 5 79 
126 13 97 
100 3 51 
114 7 65 
112 11 82 
121 9 93 
110 8 81 
103 4 38 
111 6 60 
124 2 86. 


| 


(a) Assuming that the regression is linear, estimate Во, Bi, and ;. 
(b) Predict the score of a student with an І.О. of 108 who studied six hours 


for the examination. 
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13. 


14. 


15. 


The following data were collected to determine the relationship between two 
processing variables and the hardness of a certain kind of steel: 


Hardness Copper content Annealing temperature 
(Rockwell 30-T) ( percent) (degrees F) 
» x X; 
78.9 0.02 1,000 
552 0.02 1,200 
80.9 0.10 1,000 
574 0.10 1,200 
85.3 0.18 1,000 
60.7 0.18 1,200 


Fit a straight line by the method of least squares, and use it to estimate the 
average hardness of this kind of steel when the copper content is 0.14 percent 
and the annealing temperature is 1,100 degrees F. 


When the x,’s, x;'s,... and/or the x,’s are equally spaced, the calculation 
of the B's can be simplified by using the coding suggested in Exercise 19 on 
page 462. Rework the preceding exercise coding the x,-values —1, 0, and 1, 
and the x;-values —1 and 1. (Note that for the coded x,'s and x;'s, call them 
z,'s and z;'s, we have not only У 2, = 0 and ¥ z, = 0, but also У z,z; = 0.) 


The following are data on the percent effectiveness of a pain reliever and ihe 
amounts of three different medications (in milligrams) present in each 
capsule: 


Percent 

Medication A Medication B Medication C effective 
x x X3 y 
15 20 10 47 
15 20 20 54 
15 30 10 58 
15 30 20 66 
30 20 10 59 
30 20 20 67 
30 30 10 71 
30 30 20 83 
45 20 10 72 
45 20 20 82 
45 30 10 85 


45 30 20 94 


16. 


17. 
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Assuming that the regression is linear, estimate the regression coefficients 
after suitably coding each of the x's, and express the estimated regression 
line in terms of the original variables. 

The regression models we introduced in Sections 14.2 and 14.6 are linear in 
the x's, but more importantly, they are also linear in the 8°. Indeed, they 
can be used in some problems where the relationship between the x's and y 
is not linear. For instance, when the regression is parabolic and of the form 


Hy = Bo + Bix + Вох? 
we simply use the regression equation ру; = Bo + Bixi + Box; with xy = x 
and x, = x’. Use this method to fit a parabola to the following data on the 


drying time of a varnish and the amount of a certain chemical that has been 
added: 


Amount of additive Drying time 


(grams) (hours) 
jos y 

1 8.5 

2 8.0 

3 6.0 

4 5.0 

5 6.0 

6 5.5 

7 6.5 

8 7.0 


Also, predict the drying time when 6.5 grams of the chemical are added. 


The following data pertain to the demand for a product (in thousands of 
units) and its price (in cents) charged in five different market areas: 


Price Demand 
x y 
20 22 
16 41 
10 120 
1 89 
14 56 


Fita parabola to these data by the method suggested in the preceding exercise. 
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18. 


19. 


26. 


27. 


To judge whether it was worthwhile to fit a parabola in the preceding exercise 
and not just a straight line, test the null hypothesis 8; = 0 against the 
alternative hypothesis 3; # 0 at the 0.05 level of significance. 


Use the results obtained for the data of Example 14.9 in Section 14.7 to 
construct a 90% confidence interval for the regression coefficient B, (see 
Exercise 7 above). 


. With reference to Exercise 10, test the null hypothesis 8; — 10.0 against the 


alternative hypothesis 8; # 10.0 at the 0.05 level of significance. 


‚ With reference to Exercise 10, construct a 95% confidence interval for the 


regression coefficient 8, . 


. With reference to Exercise 11, test the null hypothesis 8, = 0.12 against the 


alternative hypothesis 8, < 0.12 at the 0.05 level of significance. 


. With reference to Exercise 11, construct a 98% confidence interval for the 


regression coefficient 8›. 


. Use the results obtained for the data of Example 14.9 in Section 14.7 and 


the result of part (b) of Exercise 8 to construct a 95% confidence interval 
for the mean sales price of a three-bedroom house with two baths in the 
given housing development. 


. Use the results obtained for the data of Example 14.9 in Section 14.7 and 


the result of part (b) of Exercise 9 to construct 99% limits of prediction for 
the sales price of a three-bedroom house with two baths in the given housing 
development. 


With reference to Exercise 10, use the result of part (b) of Exercise 8 to 
construct a 98% confidence interval for the mean damage of 2,400-pound 
shipments that are moved 1,200 miles. 


With reference to Exercise 10, use the result of part (b) of Exercise 9 to 
construct 95% limits of prediction for the damage that will be incurred by 
a 2,400-pound shipment that is moved 1,200 miles. 


. With reference to Exercise 11, use the result of part (b) of Exercise 8 to 


construct a 99% confidence interval for the mean weekly net profit of 
restaurants with a seating capacity of 210 at a location where the daily traffic 
count averages 14,000 cars. 


With reference to Exercise 11, use the result of part (b) of Exercise 9 to 
construct 98% limits of prediction for the average weekly net profit of a 
restaurant with a seating capacity of 210 at a location where the daily traffic 
count averages 14,000 cars. 
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15.1 


26) 


Analysis of Variance 


INTRODUCTION 


In this chapter we shall generalize the work of Section 13.3 and consider the 
problem of deciding whether observed differences among more than two sample 
means can be attributed to chance, or whether there are real differences among 
the means of the populations sampled. For instance, we may want to decide on 
the basis of sample data whether there really is a difference in the effectiveness 
of three methods of teaching a foreign language, we may want to compare the 
average yields per acre of six varieties of wheat, or we may want to see whether 
there really is a difference in the average mileage obtained with four kinds of 
gasoline. 

Since observed differences can always be due to causes other than those 
postulated—for instance, differences in the performance of students taught a 
foreign language by three different methods may be due to differences in intel- 
ligence, and differences in the average mileage obtained with four kinds of 
gasoline may be due to differences in road conditions—we shall also discuss 
some questions of experimental design, so that, with reasonable assurance, statisti- 
cally significant results can be attributed to particular causes. 
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15.2. ONE-WAY ANALYSIS OF VARIANCE 


To give an example of a typical situation where we would perform a one-way 
analysis of variance, suppose that we want to compare the cleansing action of 
three detergents on the basis of the following whiteness readings made on 15 
swatches of white cloth, which were first soiled with India ink and then washed 
in an agitator-type machine with the respective detergents: 


Detergent A: 77, 81, 71, 76, 80 
Detergent B: 72, 58, 74, 66, 70 
Detergent С: 76, 85, 82, 80, 77 


The means of these three samples are 77, 68, and 80, and we want to know 
whether the differences among them are significant or whether they can be 
attributed to chance. 

In general, in a problem like this, we have independent random samples 
of size n from k populations. The jth value from the ith population is denoted 


ху, that is, 


Population 1: Xy, Xiz, «++» Xin 


Population 2: Хэл, X22» +++» Xan 


Population k: Худ, Xkas +++» Хп 


and we shall assume that the corresponding random variables xy, which are all 
independent, have normal distributions with the respective means ш; and the 
common variance c^. Stating these assumptions somewhat differently, we could 
say that the model for the observations is given by 


ху = Ш + ey 


..., п, Where the ey are values of nk independent 
distributions with zero means and the common 
of this model to more complicated 
ally written in the form 


fori = 1,2,..., andj = 1,2, 
random variables having normal 
variance o^. To permit the generalization 
kinds of situations (see page 510), it is usu 


„=н tate; 


fori = 1,2,..., k and j = 1,2, ...,n. Here ш is referred to as the grand mean, 


k 
and the a;, called the treatment effects, are such that L a, = 0. Note that we 
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have merely written the mean of the ith population as ш; = и + а; and imposed 
k 


the condition У o, = 0 so that the mean of the ш; equals the grand mean д. 


i=l 
The practice of referring to the different populations as different treatments is 
due to the fact that many analysis-of-variance techniques were originally 
developed in connection with agricultural experiments where different fertilizers, 
for example, were regarded as different treatments applied to the soil. Thus, we 
shall refer to the three detergents of our example on page 499 as three different 
treatments, and in other problems we may refer to four nationalities as four 
different treatments, five kinds of advertising campaigns as five different treat- 
ments, and so on. 
The null hypothesis we shall want to test is that the population means are 
all equal, namely, that ш, = 4; = *** = ик or equivalently that 


Hj: te, = 0 fori —1,2,...,k 


Correspondingly, the alternative hypothesis is that the population means are not 
all equal, namely, that 


Н: a; #0 for at least one value of i 


The test, itself, is based on an analysis of the total variability of the combined 
data (nk — 1 times their variance), which is given by 


kon 
ToS cee 
xjy— X) wh Хх = — Ў 
Ё, 2, < J Ebr nk à P ху 


If the null hypothesis is true, all this variability is due to chance, but if it is not 
true, then part of the above sum of squares is due to the differences among the 
Population means. To isolate, or separate, these two contributions to the total 
variability of the data, we refer to the following theorem: 


THEOREM 15.1 


k n 


k 
X X Gye)? = n= У (==) + Y Y (x —x) 


isi j=1 i 


where x, is the mean of the observations from the ith population and x, is 
the mean of all nk observations. 
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Proof. 
k 


n k n 
X Loy als у DIETO E a 


їчї j=l 


k n 
2 E [(& — £* + 208, — £)xy — X) 
+ (xy з &y] 


k n п 
E i-ati ene 


imi jal 


kon 
ars D (xy m +) 
[PI 


k kon 
nS (RHR TE XGA 
Im int j=l 
since У, (xy — Ж) = 0 for each value of i ¥ 
j=l 


It is customary to refer to the expression on the left-hand side of the identity 
of Theorem 15.1 as the total sum of squares, to the first term of the expression 
on the right-hand side as the treatment sum of squares, and to the second term 
as the error sum of squares, where "error" denotes the experimental error, or 
chance. Correspondingly, we denote these three sums of squares by SST, SS(Tr), 


and SSE, and we can write 
SST = SS(Tr) + SSE 


Now we have accomplished what we set out to do: We have partitioned SST, a 
о components—the 


measure of the total variation of the combined data into tw 
second component, SSE, measures chance variation (namely, the variation within 
the samples); the first component, SS(Tr), also measures chance variation when 
the null hypothesis is true, but it also reflects the variation among the population 
means when the null hypothesis is false. 

Since, for each value of i, the xy à 
from a normal population with the variance o, 
that for each value of i 


re values of a random sample of size n 
it follows from Theorem 8.10 


1 п 
= Д Ў, (xy ~ R)? 
ribution with n — 1 degrees of 


is a random variable having the chi-square dist: 
les are independent, it follows 


freedom. Furthermore, since the k random samp. 
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from Theorem 8.8 that 


k n 
z» Y (xy — X) 
imi jmi 


is a random variable having the chi-square distribution with k(n — 1) degrees 
of freedom. Since the mem of a chi-square distribution equals its degrees of 


freedom, we find that $ =a SSE i is a value of a random variable having the mean 
k(n — 1), and, hence, that BE. can serve as an estimate of c?. This quantity, 
I is called the error mean square and it is denoted by MSE. 


Also, since under the null hypothesis the X, are values of independent 
random ee having identical normal distributions with the mean д and the 


variance < —, it follows from Theorem 8.10 that 
k 
n 
E. Ru 2 
a ERR) 


is a random variable having the chi-square distribution with k — 1 degrees of 


freedom. Since the mean of this distribution is К — 1, it follows that ———— sam 


SS(Tr) 
Кей 


provides a second estimate of o^. This quantity, ‚ is called the treatment 


mean square and it is denoted by MS(Tr). 

Of course, if the null hypothesis is false, then, according to Exercise 1 on 
page 505, MS(Tr) provides an estimate of o^ plus whatever variation there may 
be among the population means. This suggests that we reject the null hypothesis 
that the population means are all equal when MS(Tr) is appreciably greater than 
MSE. To put this decision on a precise basis, we shall have to assume without 
proof that the corresponding estimators are independent, for with this assumption 
we can utilize Theorem 8.13, according to which 


SS(Tr) 
_ (k=1)0* _ MST) 
SSE MSE 


k(n = 1)o? 


ж: 
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is a value of a random variable having the F distribution with k — 1 and k(n — 1) 
degrees of freedom.’ Thus, we reject the null hypothesis that the population 
means are all equal if the value we obtain for F exceeds F, ‘a,k—-1,k(n—1)) Where a 
is the level of significance. 

The procedure we have described in this section is called a one-way analysis 
of variance, and the necessary details are usually presented in the following kind 
of analysis-of-variance table: 


Fr 


Source of Degrees of Sum of Mean F 
variation freedom squares square 
MS(Tr) 
Treatments k-1 SS(Tr) Sn MSE 
Error k(n — 1) SSE MSE 
—————— 
Total kn - 1 SST 
Lon nerd аы 


To simplify the calculation of the various sums of squares, we usually use 
the following computing formulas, which the reader will be asked to derive in 


Exercise 2 on page 505: 


е T 


ТНЕОВЕМ 15.2 


SST = Y ba gi 


i=l j=l 
and 
1 Е 2 2 
=-: Bm eee вал 
SS(Tr) z p Ti. 7 


where Т, is the total of the values obtained for the ith treatment and T. is 
the grand total of all nk observations. 


ааваа 


Then, the value of SSE can be obtained by subtracting SS(Tr) from SST. 


' A proof of this independence may be found in the book by H. Scheffé listed on 
page 519. 
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EXAMPLE 15.1 


With reference to the illustration on page 499, test at the 0.01 level of significance 
whether the differences among the means of the whiteness readings are significant. 


Solution 


T. 


2. 


Ho: a = 0 for i = 1,2,3 
H,: a, # 0 for at least one value of i 


Reject the null hypothesis if F > 6.93, where F is obtained by a one-way 
analysis of variance and 6.93 is the value of Fo. 


The required sums and sums of squares are Т, = 385, Т, = 340, 
Тз, = 400, T, = 1,125, and УУ х? = 85,041, and substitution of these 
values together with k = 3 and n = 5 into the formulas of Theorem 
15.2 yields 


SST = 85,041 — 34(1,125)? 
= 666 


and 


SS(Tr) = 3(385 + 340° + 400?) – (1,125)? 
= 390 


Then, by subtraction,’ SSE = 666 — 390 = 276, and the remaining 
calculations are shown in the following analysis-of-variance table: 


Source of 
variation 


Total 


Note that the mean squares are simply the sums of squares divided by 
the corresponding degrees of freedom. 
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4. Since F — 8.48 exceeds 6.93, the null hypothesis must be rejected, and 
we conclude that the three detergents are not all equally effective. A 


The parameters of the model on page 499, namely, ш and the о, are usually 


estimated by the method of least squares. That is, their estimates are the values 
which minimize 


k п 
Y Ily- (et а) 


i-i jet 


k 
subject to the restriction that У a, = 0; as the reader will be asked to verify in 
1=1 


Exercise 6 below, these least squares estimates are 2 = X, and а=, k 


THEORETICAL EXERCISES 


t 


> 


For the one-way analysis of variance with k independent samples of size n, 
show that 


k 
n: Y (X, - X) n: Yaoi 
JW. ВЗА [imm ON TE. 


к Kl 


E 


. Prove Theorem 15.2. 
. If, in a one-way analysis of variance, the sample sizes are unequal and there 


are n, observations for the ith treatment, show that 
k Er LOTES kon ms 
»y Ў (xy OX) = È n(X, — ЖААЛ: У, У (ху - Xj) 
i=l jal i=l {=1 jel 


analogous to the identity of Theorem 15.1. Also show that the degrees of 
freedom for SST, SS(Tr), and SSE are, respectively, N — 1, k — 1, and 
k 


N — k, where N = L ny. 


With reference to the preceding exercise, show that the computing formulas 


for the sums of squares are 
Riz- i 1 
T= -= Т. 
T p P M N 


Logs fece à 
e eT 
SS(TY) l AN ^ 
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and 
SSE = SST - SS(Tr) 


5. Show that for k = 2 the F test of a one-way analysis of variance is equivalent 
to the t test of Section 13.3 with ô = 0. 


6. Use Lagrange multipliers to show that the least squares estimates of the 
parameters of the model on page 499 are Ê = X and â; = X, — X. 


APPLIED EXERCISES 


7. To compare the effectiveness of three different types of phosphorescent 
coatings of airplane instrument dials, eight dials each are coated with the 
three types. Then the dials are illuminated by an ultraviolet light, and the 
following are the number of minutes each glowed after the light source was 
shut off: 


Туре 1: 52.9, 62.1, 57.4, 50.0, 59.3, 61.2, 60.8, 53.1 
Type 2: 58.4, 55.0, 59.8, 62.5, 64.7, 59.9, 54.7, 58.4 
Type 3: 71.3, 66.6, 63.4, 64.7, 75.8, 65.6, 72.9, 67.3 


Test the null hypothesis that there is no difference in the effectiveness of the 
three coatings at the 0.01 level of significance. 


8. The following are the numbers of mistakes made in five successive weeks by 
four technicians working for a medical laboratory: 


Technician I: 13, 16, 12, 14, 15 
Technician П: 14, 16, 11, 19, 15 
Technician III: 13, 18, 16, 14, 18 
Technician IV: 18, 10, 14, 15, 12 


Test at the 0.05 level of significance whether the differences among the four 
sample means can be attributed to chance. 


. Three groups of six guinea pigs each were injected, respectively, with 0.5 mg, 
1.0 mg, and 1.5 mg of a new tranquilizer, and the following are the numbers 
-of minutes it took them to fall asleep: 


0.5 тв: 21,23,19, 24, 25, 23 
l0mg: 19, 21, 20, 18, 22, 20 
1.5mg: 15, 10, 13, 14, 11, 15 


Test at the 0.05 level of significance whether the null hypothesis that differen- 
ces in dosage have no effect can be rejected. Also estimate the parameters 
А, а, а, and аз of the model used in the analysis. 
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10. The following are the numbers of words per minute which a secretary typed 
on several occasions on four different typewriters: 


Typewriter С: 71, 75, 69, 77, 61, 72, 71, 78 
Typewriter D: 68, 71, 74, 66, 69, 67, 70, 62 
Typewriter E: 75, 70, 81, 73, 78, 72 
Typewriter Е: 62, 59, 71, 68, 63, 65, 72, 60, 64 


Use the computing formulas of Exercise 4 to calculate the sums of squares 
required to test at the 0.05 level of significance whether the differences among 
the four sample means can be attributed to chance. 

11. A consumer testing service, wishing to test the accuracy of the thermostats 
of three dífferent kinds of electric irons, set them at 480°F and obtained the 
following actual temperature readings by means of a thermocouple: 


Iron X: 474, 496, 467, 471 
Iron Y: 492, 498 
Iron 2: 460, 495, 490 


Use the computing formulas of Exercise 4 to calculate the sums of squares 
required to test at the 0.05 level of significance whether the differences among 
. the three sample means can be attributed to chance. 


15.3 EXPERIMENTAL DESIGN 


In Example 15.1 it may have seemed reasonable to conclude that the three 
detergents are not equally effective; yet, a moment's reflection will show that this 
conclusion is not so “reasonable” at all. For all we know, the swatches cleaned 
with detergent B may have been more soiled than the others, the washing times 
may have been longer for detergent C, there may have been differences in water 


hardness or water temperature, and even the instruments used to make the 


whiteness readings may have gone out of adjustment after the readings for 
detergents А and C were made. А 

It is entirely possible, of course, that the differences among the three sample 
means are due largely to differences in the effectiveness of the detergents, but 
we have just listed several other factors which could be held responsible. It is 
important to remember that a significance test may show that differences among 
sample means are too large to be attributed to chance, but such a test cannot say 
why the differences occurred. 1 

In general, if we want to show that one factor (among various others) can 
be considered the cause of an observed phenomenon, we must somehow make 
sure that none of the other factors can reasonably be held responsible. There are 
various ways in which this can be done; for instance, we can conduct a rigorously 
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controlled experiment in which all variables except the one of concern are held 
fixed. To do this in the example dealing with the three detergents, we might soil 
the swatches with exactly equal amounts of india ink, always use the same washing 
time, water of exactly the same hardness and temperature, and inspect (and, if 
necessary, adjust) the measuring instruments after each use. Under such rigidly 
controlled conditions, significant differences among the sample means cannot be 
due to differently soiled swatches, or differences in washing time, water tem- 
perature, water hardness, or measuring instruments. On the positive side, the 
differences among the means show that the detergents are not all equally effective 
if they are used in this narrowly restricted way. Of course, we cannot say whether 
the same differences would exist if the washing time were longer or shorter, if 
the water had a different temperature or hardness, and so on. 

In most cases, '*overcontrolled" experiments like the one just described do 
not really provide us with the kind of information we want. So, we look for 
alternatives, and at the other extreme we can conduct experiments in which none 
of the extraneous factors is controlled, but in which we protect ourselves against 
their effects by randomization. That is, we design, or plan, the experiments in 
such a way that the variations caused by extraneous factors can all be combined 
under the general heading of "chance." For instance, in our example we could 
accomplish this by randomly assigning five of the soiled swatches to each deter- 
gent, and randomly specifying the order in which they are to be washed and 
measured. When all the variations due to uncontrolled extraneous factors can 
thus be included under the heading of chance variation, we refer to the design 
of the experiment as a completely randomized design. 

It should be apparent, however, that randomization protects against the 
effects of the extraneous factors only in a probabilistic sort of way. For instance, 
in our example it is possible, though very unlikely, that detergent A will be 
randomly assigned to the five swatches which happen to be the least soiled, or 
thatthe water happens to be coldest when we wash the five swatches with detergent 
B. It is partly for this reason that we often try to control some of the factors and 
randomize the others, and thus use designs that are somewhere between the two 
extremes which we have described. 

To introduce another important concept in the design of experiments, let 
us consider the following data on the amount of time (in minutes) it took a 
certain person to drive to work, Monday through Friday, along four different 
routes: 


Route 1: 22, 26, 25, 25, 31 
Route 2: 25, 27, 28, 26, 29 
Route 3: 26, 29, 33, 30, 33 
Коше 4: 26, 28, 27, 30, 30 


The means of these four samples are 25.8, 27.0, 30.2, and 28.2, and since the 
differences among them are fairly large, it would seem reasonable to conclude 
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that there are some real differences in the true average time it takes the person 
to drive to work along the four different routes. This does not follow, however, 
from a one-way analysis of variance. We get F — 2.80, and since this does not 
exceed F оѕзлє = 3.24, the null hypothesis cannot be rejected. 

Of course, the null hypothesis may be true, but observe that there are not 
only considerable differences among the four means, but also large differences 
among the values within the samples. In the first sample they range from 22 to 
31, in the second sample from 25 to 29, in the third sample from 26 to 33, and 
in the fourth sample from 26 to 30. Not only that, but in each sample the first 
value is the smallest and the last value is the largest. The latter suggests that the 
variation within the samples may well be due to differences in driving conditions 
on the different days of the week. If this is the case, variations due to driving 
conditions were included in the error sum of squares of the one-way analysis of 
variance, the denominator of the F statistic was “inflated,” and this may be why 
the results were not significant. 

To avoid this kind of situation, we could hold the extraneous factor fixed, 
but this will seldom give us the information we want. In our example, we could 
limit the study to driving conditions on Monday, but then we would have no 
assurance that the results would apply also to driving conditions on Tuesday or 
on any other day of the week. Another possibility is to vary the extraneous factor 
deliberately over as wide a range as necessary, 5o that the variation it causes can 
be measured and, hence, eliminated from the error sum of squares. This means 
that we must plan the experiment in such a way that we can perform a two-way 
analysis of variance, in which the total variation of the data is partitioned into 
three components attributed, respectively, to treatments (in our example, the four 
routes), the extraneous factor (in our example, driving conditions on the different 
days of the week), and experimental error, or chance. 

What we have suggested here is called blocking and the different days of 
the week are referred to as blocks. In general, blocks are the levels at which we 
hold an extraneous factor fixed, so that we can measure its contribution to the 
total variation of the data. If each treatment appears the same number of times 
in each block (in our example, each route is used once each day of the week), 
we say that the design of the experiment is a complete block design. Furthermore, 
if the treatments are distributed at random within each block (in our example, 
we would randomly distribute the four routes among the four Mondays, the four 
Tuesdays, etc.), we say that the design of the experiment is a randomized block 


design. 


154 TWO-WAY ANALYSIS OF VARIANCE 


o different ways of analyzing two-variable experiments, 


re essentially 1% А 
There are y he two variables are independent or whether they 


and they depend on whether t! 
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interact. To illustrate what we mean here by "interact," suppose that a tire 
manufacturer is experimenting with different treads, and that he finds that one 
kind is especially good for use on dirt roads while another kind is especially 
good for use on hard pavement. If this is the case, we say that there is an 
interaction between road conditions and tread design. In this book we shall study 
only the no-interaction case. 

To present the theory of a two-way analysis of variance, we shall use the 
terminology introduced in the preceding sections and refer to the two variables 
as treatments and blocks; alternatively, we could refer to them also as factor A 
and factor B, or as rows and columns. Thus, if x, for i = 112;...,k and j = 
1,2,...,n are values of independent raridom variables having normal distribu- 
tions with the respective means шу and the common variance a°, we shall consider 
the array 


Block 1 Block 2 eie Block n 
Treatment 1 Xu | Xi ttt Xin 
Treatment 2 Xn | X22 a хэ» | 


Treatment К Xxn 


and write the model for a two-way analysis of variance (without interaction) as 
xj = ш + a, + B+ е 


fori = 1,2,...,k andj = 1, 2,..., n. Here и is the grand mean, the treatment 
k 


n 

effects а; are such that Y а; = 0, the block effects 8; are such that У B; = 0, 
і=1 j=) 

and the e; are values of independent random variables having normal distributions 

with zero means and the common variance a^. Note that 


Hy = wat; 


and, as the reader will be asked to verify in Exercise 2 on page 515, 
k n 
S X: ш; 


i=1 j=l Hs 


nk pas 
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The two null hypotheses we shall want to test are that the treatment effects 
are all equal to zero and that the block effects are all equal to zero, namely, 


Hy а= 0 fori = 1,2, ...,k 
and 
Ho: Bj,70 огј = 1, 2,...,п 


The alternative to Hy is that the treatment effects аге not all equal to zero, and 
the alternative to H} is that the block effects are not all equal to zero. Symbolically, 


Hy: a #0 for at least one value of i 


and 


Hi: В. #0 ога! least one value of j 


The two-way analysis, itself, is based on the following generalization of 
Theorem 15.1, which the reader will be asked to prove in Exercise 1 on page 515. 


ee ee 
THEOREM 15.3 


Ў faaan. Y eR +k i (z, - 2)" 


i=) j=l i=) j=) 


kon 
4¥ ¥ (xy - 5-8, +2) 


і=1 je! 


is the mean of the observations for the ith treatment, X, is the 


where X; 
h block, and X is the mean of all nk 


mean of the observations for the jt 
observations. 


| Т. э ыи жс ше с ———— 


he left-hand side of the identity of Theorem 15.3 is the 
T as defined on page 501 and the first term on the 
m of squares SS(Tr). Measuring the variation 
the right-hand side is the block sum of squares 
ht-hand side is the new error sum of squares 


The expression on t 
total sum of squares SS 
right-hand side is the treatment su 
among the X j, the second term on 
SSB, and the third term on the rig 
SSE. Thus, we have 


SST = SS(Tr) + SSB + SSE 


512 


Chap. 15: Analysis of Variance i Lj 


i : SS(Tr) SSE : 
and it can be shown that if Ho is true, then 5 and Pos ате values of 
independent random variables having chi-square distributions with k — 1 and 
(n — 1)(k — 1) degrees of freedom. If Hp is not true, then SS(Tr) will also reflect 
the variation among the aj, and according to Theorem 8.13 we reject Ho if 


Ет, > Е.к-1п-00к-1» where 


SS(Tr) 
MS (k-1)ec _ MS(T) 
dn SSE MSE 


(n – 1)(k - 1)о? 


Неге and below, the mean squares аге again the respective sums of squares 
divided by their degrees of freedom. 
NUM k ү SSB SSE - 

Similarly, if Но is true, then TT and A are values of independent ran- 
dom variables having chi-square distributions with n — 1 and (n — 1)(k - 1) 
degrees of freedom. If Н is not true, then SSB will also reflect the variation 
among the B, and according to Theorem 8.13 we reject Н Fg > F-10001» 
where 


SSB 
el emi Do MSB 
с SSE ~ MSE 


(n = 1)(k — 1)o* 


This kind of analysis is called a two-way analysis of variance, and the 
necessary details are usually presented in the following kind of analysis-of- 
variance table: 


Source of Degrees of Sum of Mean F 
variation freedom Squares Square 
—— 4 
MS(Tr 
Treatments Ek SS(Tr) | MS(Tr) Fy, = msm 
MSB 
Blocks n-1 EH 
SSB MSB Fg MSE 
Т + 
Error (n — 1)(k - 1) SSE MSE 
— 
Total nk —1 SST 
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To simplify the calculations, SST and SS(Tr) are usually determined by 
means of the formulas of Theorem 15.2, and SSB can be determined by means 
of the following formula, which the reader will be asked to derive in Exercise 4 
on page 515: 


THEOREM 154 


where T; is the total of the values obtained for the jth block and T, is the 
grand total of all nk observations. 


Then, the value of SSE can be obtained by subtracting SS(Tr) and SSB from SST. 


EXAMPLE 15.2 


With reference to the illustration on page 508, where we had 


Monday Tuesday Wednesday Thursday Friday 


Route 1 


ether the differences among the means 
s) are significant, and also whether 
days of the week 


test at the 0.05 level of significance wh 
obtained for the different routes (treatment: 1 
the differences among the means obtained for the different 


(blocks) are significant. 


Solution 
1. Ho: a, = Ofori = 1,2,3,4 
Ho: B, = 0forj = 1,2,3,4 5 
Hy; a #9 for at least one value of i 
5o p; # 0 for at least one value of j 
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2; 


Reject the null hypothesis for treatments if Ет, > 3.49 and reject the 
null hypothesis for blocks if Fz 2 3.26, where Fr and Ев are obtained 
by means of a two-way analysis of variance and 3.49 and 3.26 are, 
respectively, the values of Fos.312 and Еоѕ 412: 

The required sums and sums of squares аге Т, = 129, To, = 135, Т, = 
151, Т, = 141, T, = 99, T; = 110, T3 = 113, T, = 111, Ts = 123, 
T. = 556,and> > x? = 15,610, and substitution of these values together 
with К = 4and n = 5 into the formulas of Theorems 15.2 and 15.4 yields 


SST = 15,610 — 2(556)° 
= 153.2 

1(129? + 135? + 151? + 1417) — %(556)* 

52.8 

SSB = 99? + 110? + 113? + 111? + 123?) — 39(556)° 
£732 


SS(Tr) 


and, hence, 
SSE = 153.2 — 52.8 — 73.2 


21:2, 


П 


The remaining calculations are shown іп the following analysis-of- 
variance table: 


Source of Degrees of Sum of Mean 
variation freedom squares square 


Treatments 3 52.8 — = 17.6 — = 7.75 


Blocks 4 732 =— = {83 — = 8.06 
2.27 


Error 12 27.2 


== s 80 
Total 19 153.2 


S 


Since Fy, = 7.75 exceeds 3.49 and Ев = 8.06 exceeds 3.26, both null 
hypotheses must be rejected. In other words, the differences among the 
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means obtained for the four routes are significant, and so are the 
differences among the means obtained for the different days of the 
week. Note, however, that we cannot conclude that Route 1 is necessarily 
fastest and that on Fridays traffic conditions are always the worst. All 
we have shown by means of the analysis is that differences exist, and 
if we want to go one step further and pinpoint the nature of the 
differences, we will have to use one of the so-called multiple comparisons 
tests referred to on page 520. A 


THEORETICAL EXERCISES 


1. Make use of the identity 


xo: XE (ЖА x)*t(x-X -X,t x) 


A 

E: 
| 
* 
1l 


to prove Theorem 15.3. 
2. With reference to the notation on page 510 show that 


k n 


у у ш 


1=1ј=1 


nk Z5 


3. For the two-way analysis of variance with k treatments and n blocks, show 
that 


4. Prove Theorem 15.4. 

5. A Latin square is a square array in which еасі 
symbol) appears exactly once in each row a 
instance, 


h letter (or some other kind of 
nd once in each column. For 
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is a 4 x 4 Latin square. If we look upon the m rows of a Latin square as the 
levels of one variable, the m columns as the levels of a second variable, and 
A, B, C,...,as m “treatments,” namely, as the levels of a third variable, it 
is possible to test hypotheses concerning all three of these variables on the 
basis of as few as m^ observations (provided there are no interactions), 
Letting Xy denote the observation in the ith row and the jth column of a 
Latin square (so that k, denoting the treatment, is determined when we give 
i and j), we write the model equation as 


Xyk = M+ a, + B, t т + е, 


for i = 1, 2,...,m, j = 1, 2,...,m, and k = 1, 2,..., m, where ш is the 


m 
grand mean, the row effects a, are such that У o, = 0, the column effects £; 
ILI! 


m m 
are such that У 8, = 0, the treatment effects т, are such that У 7, = 0, 
j=l k=1 
and the е, are values of independent random variables having normal 
distributions with zero means and the common variance o°. The null 
hypotheses we shall want to test (against appropriate alternatives) are that 
the row effects are all zero, that the column effects are all zero, and that the 
treatment effects are all zero. 


(a) Show that 


У XOwo-£ff-m У (а-я) т. F(R, #9 
i=1 j= =i =1 


» "ue 
+m: p ЕУ Обу m 4m €) — Xo + 2%)? 
- iei j=1 


where X; is the mean of all the observations for the kth treatment and 
the other means are as defined in Theorem 15.3. The expression on the 
left-hand side of the above identity is the total sum of squares SST, 
while those on the right-hand side are, respectively, the row sum of 
squares SSR, the column sum of squares SSC, the treatment sum of 
squares SS(Tr), and the error sum of squares SSE. 

(b) Construct an analysis-of-variance table for this kind of experiment, 
determining the degrees of freedom for SSE by subtracting those for 
SSR, SSC, and SS(Tr) from m — 1, the degrees of freedom for SST. 


APPLIED EXERCISES 


6. An experiment was performed to judge the effect of four different fuels and 


three different types of launchers on the range of a certain rocket. Test, on 
the basis of the following ranges, in miles, whether there is a significant effect 
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due to differences in fuels and whether there is a significant effect due to 
differences in launchers: 


Fuell Fuel2  Fuel3  Fuel4 


Launcher X 


Launcher Y 


Launcher Z 


Use the 0.01 level of significance. 


7. The following are the cholesterol contents, in milligrams per package, which 
four laboratories obtained for 6-ounce packages of three very similar diet 
foods: 


Diet food Diet food Diet food 
A B с 


Laboratory 1 
Laboratory 2 
Laboratory 3 
Laboratory 4 


Perform a two-way analysis of variance and test the null hypotheses concern- 
ing the diet foods and the laboratories at the 0,05 level of significance. 


8. A laboratory technician measures the breaking strength of each of five kinds 
of linen threads by using four different measuring instruments, /,, /,, /,, and 
I,, and obtains the following results, in ounces: 


1 1 h 1, 


Thread 1 
Thread 2 
Thread 3 
Thread 4 
Thread 5 


Perform a two-way analysis of variance, using the 0.05 level of si 
for both tests. significance 
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9. The sample data in the following Latin square (see Exercise 5) are the grades 
in an American history test obtained by nine college students of various 
ethnic backgrounds and of various professional interests, who were taught 
by instructors A, B, and C: 


Ethnic background 


Mexican German Polish 
: A | s Fs 
са 75 | 86 69 
а B Хе А 
Medicine E 2 sé | 
T | 
Eneineerin | A B 
ngu | 
Tee lk SEAT 83 93. | 


Analyze these data by the method of Exercise 5 and test the following 
hypotheses at the 0.05 level of significance: 


(a) Having a different instructor has no effect on the grades. 
(b) Differences in ethnic background have no effect on the grades. 
(c) Differences in professional interest have no effect on the grades. 


10. Among the nine persons interviewed in a poll, three are Easterners, three are 
Southerners, and three are Westerners. By profession, three of them are 
teachers, three are lawyers, and three are doctors, and no two of the same 
profession come from the same part of the United States. Also, three are 
Democrats, three are Republicans, and three are Independents, and no two 
of the same political affiliation are of the same profession or come from the 
same part of the United States. If one of the teachers is an Easterner and an 
Independent, another teacher is a Southerner and a Republican, and one of 
the lawyers is a Southerner and a Democrat, what is the political affiliation 
of the doctor who is a Westerner? [Hint: Construct a Latin square (see 
Exercise 5) with m = 3.] This exercise is a simplified version of a famous 
problem posed by R. A. Fisher in his classical work, The Design of Experiments. 


15.5 SOME FURTHER CONSIDERATIONS 


In this chapter we have presented a brief introduction to some of the basic 
methods and ideas of analysis of variance and experimental design. The scope 
of these subjects, which are closely interrelated, is vast, and new methods are 
constantly being developed as their need arises in experimentation. 
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The designs which we have discussed all had the special feature that there 
were observations corresponding to all possible combinations of the values 
(levels) of the variables under consideration. To show that this can be very 
impractical or even physically impossible, we have only to consider an experiment 
in which we want to compare the yield of 25 varieties of wheat and, at the same 
time, the effect of 12 different fertilizers. To perform an experiment in which 
each of the 25 varieties of wheat is used in conjunction with each of the 12 
fertilizers, we would have to plant 300 plots, and it does not require much 
imagination to see how difficult it would be to find that many test plots for which 
soil composition, irrigation, slope,...,are constant or otherwise controllable. 
Consequently, there is a need for designs which make it possible to test hypotheses 
concerning the most relevant (though not all) parameters of the model on the 
basis of experiments which are feasible from a practical point of view. This leads 
to so-called incomplete block designs, which are discussed in the general references 
on experimental design listed at the end of the chapter. 

Further complications arise when there are extraneous variables which can 
be measured but not controlled. For example, in a comparison of various kinds 
of "teaching machines" it may be impossible to use persons who all have the 
same I.Q., but at least their 1.Q.’s can be determined. In a situation like that we 
might use an analysis-of-covariance model such as 


xj = H tai + By + ei 


which differs from the one-way analysis of variance model in that we added the 
term fy;;, where the у: are the given I.Q.'s. Note that in this model the estimation 
of B is essentially a problem of regression. 

Other difficulties arise when the parameters a; and B, in an analysis-of- 
variance model are not constants, but values of random variables. This kind of 
situation would arise, for example, if there are 25 varieties of wheat and 12 kinds 
of fertilizers and we randomly select, say, six of the varieties of wheat and three 
of the fertilizers to be included in an experiment. 

These are just some of the generalizations of the methods we have presented 
in this chapter; they are treated in detail in the general texts on analysis of 
variance and experimental design which are listed below. 
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16.1 


16 


Nonparametric Methods 


INTRODUCTION 


Most of the tests discussed in the three preceding chapters required specific 
assumptions about the population, or populations, sampled. In most cases we 
assumed that the populations sampled are normal; sometimes we assumed that 
their standard deviations are known or are known to be equal; and sometimes 
we assumed that the samples are independent. Since there are many situations 
in which the required assumptions cannot be met, statisticians have developed 
alternative techniques which have become known as nonparametric methods, This 
term is used somewhat loosely to include distribution-free methods (like the 
tolerance limits of Exercise 10 on page 303) where we make no assumptions ` 
about the populations, except perhaps that they are continuous, It also includes 
methods which are nonparametric only in the sense that we are not concerned 
with the parameters of populations of a given kind, 

Aside from the fact that nonparametric methods can be used under more 
general conditions than the standard techniques which they replace, they have 
great intuitive appeal; that is, they are easy to explain and easy to understand, 
Moreover, in many nonparametric methods the computational burden is so light 
that they come under the heading of "quick and easy" or “short-cut” techniques. 
For these reasons, nonparametric methods 'have become quite popular, and 
extensive literature is devoted to their theory and application, 
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The main disadvantage of nonparametric methods is that they may be 
wasteful of information, and thus less efficient than the standard techniques 
which they replace. It should be observed, however, that such efficiency com- 
parisons usually assume that the conditions underlying the standard methods are 
met and, hence, they tend to understate the real worth of the nonparametric 
methods. To put this another way, it is true in general that the less one assumes, 
the less one can infer from a set of data, but it is also true that the less one assumes, 
the more one broadens the applicability of one's method. 


162 THE SIGN TEST 


The standard 1 test of the null hypothesis и = ш, is based on the assumption 
that we are sampling a normal population. When this assumption is untenable, 
this standard test can be replaced by any one of several nonparametric alternatives, 
among them the one-sample sign test. 

The one-sample sign test applies when we sample a continuous symmetrical 
population, so that the probability of getting a sample value exceeding the mean 
and the probability of getting a sample value less than the mean are both ТО 
test the null hypothesis u = де against an appropriate alternative on the basis 
of a random sample of size n, we replace each sample value exceeding шо with 
a plus sign and each sample value less than Шо With a minus sign, and then we 
test the null hypothesis that the number of plus signs is the value of a random 
variable having a binomial distribution with the parameters n and 0 = 2. The 
two-sided alternative ш # jy thus becomes 0 = 3, and the one-sided alternatives 
H < uo and и > po become 0 < запа Ө 1, respectively. If a sample value 
actually equals uo, which does not have zero probability when we deal with 
rounded data even though the population is continuous, we simply discard it. 

To perform a one-sample sign test when the sample is very small, we refer 
directly to a table of binomial probabilities such as Table I; when the sample is 
large, we use the normal approximation to the binomial distribution. 


EXAMPLE 16.1 


The following are measurements of the breaking strength of a certain kind of 
2-inch cotton ribbon in pounds: 


163 165 160 189 161 171 158 151 169 162 
163 139 172 165 148 166 172 163 187 173 


‘If it cannot be assumed that the population is symmetrical, we use the same 
technique but apply it to the null hypothesis д = jig instead of Ш = po, where д is the 
population median. 
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Test the null hypothesis ш = 160 against the alternative ш > 160 at the level of 
significance а = 0.05. 


Solution 


1. Hy и = 160 

Н: p > 160 

Reject the null hypothesis if x = kos, where x is the number of plus 

signs and kos is as defined on page 429. 

3. Replacing each value exceeding 160 with a plus sign, each value less 
than 160 with a minus sign, and discarding the one value which equals 
160, we get 


кә 


ЕЖЕ Ж = t 


so that п = 19 and x = 15. From Table I we find that kos = 14 for 
n = 19. 

4. Since x = 15 exceeds kos = 14, the null hypothesis must be rejected 
and we conclude that the mean breaking strength of the given kind of 
ribbon exceeds 160 pounds. a 


EXAMPLE 16.2 


The following data, in tons, are the amounts of sulfur oxides emitted by a large 
industrial plant in 40 days: 


17 15 20 29 19 18 22 25 vy E) 
24 20 17 6 24 14 15 23 24 26 
19 23 28 19 16 22 24 17 220: 13 
19510752318 031701131 20/017 24 14 


Test the null hypothesis ш = 21.5 against the alternative hypothesis ш < 21.5 at 
the level of significance a = 0.01. 


Solution 
Ну een: 
Heo 205 
2. Reject the null hypothesis if z = —201 = —2.33, where 


(x + 1) - n8 


n6(1 = 6) 


У 


524 


Chap. 16: Nonparametric Methods 


0 = 3, and x is the number of plus signs (that is, values exceeding 21.5). 
3. Since n = 40 and x = 16, we get пб = 40 : } = 20, /n&(1 — 0) = 
У40(0.5)(0.5) = 3.16, and hence 


(16 + 1) – 20 
ceu e EU 
à 3.16 


4. Since z = -1.11 is not less than —zo, = —2.33, the null hypothesis 
cannot be rejected. A 


The sign test can also be used when we deal with paired data as in Exercises 
14 and 15 on page 424. In such problems, each pair of sample values is replaced 
by a plus sign if the difference between the paired observations is positive 
(that is, if the first value exceeds the second value) and by a minus sign if the 
difference between the paired observations is negative (that is, if the first value 
is less than the second value). To test the null hypothesis that two continuous 
symmetrical populations have equal means, we can thus use the sign test, which, 
in connection with this kind of problem, is referred to as the paired-sample sign 
test. If the difference between a pair of observations is zero, we discard it. 


EXAMPLE 16.3 


To determine the effectiveness of a new traffic control system, the number of 
accidents that occurred at 12 dangerous intersections during four weeks before 
and four weeks after the installation of the new system was observed, and the 
following data were obtained: 


3and1, Sand2, 2and 0, 3and2, 3and2, запао 

Oand2, 4and3, land 3, 6and4, 4andl, landO 
Use the paired-sample sign test to test the null hypothesis that the new traffic 
control system is not effective at а = 0,05, (In this case, the populations sampled 


are, of course, not continuous, but this does not matter since zero differences are 
discarded,) 


Solution 
1. Н: m =m 
Hy gp из = 
2, Reject the null hypothesis if x = kos, Where x is the number of plus 
signs (positive differences) and Kos is as defined on page 429, 
3, Replacing each pair of values by the sign of their difference, we get 


+++ +++ + +++ 
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so that n = 12 and x = 10. From Table I we find that ko; = 10 for 
п = 12. 

4. Since x = 10 equals ko. = 10, the null hypothesis must be rejected and 
we conclude that the new traffic control system is effective in reducing 
the number of accidents at dangerous intersections. А 


16.3 THE SIGNED-RANK TEST 


As we saw in the preceding section, the sign test is very easy to perform, but 
since we utilize only the signs of the differences between the observations and 
шо in the one-sample case, or the signs of the differences between the pairs of 
observations in the paired-sample case, it tends to be wasteful of information. 
An alternative nonparametric test, the Wilcoxon signed-rank test, is less wasteful 
in that it takes into account also the magnitudes of the differences. In this test, 
we rank the differences without regard to their signs, assigning rank 1 to the 
smallest difference in absolute value, rank 2 to the second smallest difference in 
absolute value, ..., and rank n to the largest difference in absolute value. Zero 
differences are again discarded, and if the absolute values of two or more 
differences are the same, we assign each one the mean of the ranks which they 
jointly occupy. Then, the signed-rank test is based on Т“, the sum of the ranks 
assigned to the positive differences, Т”, the sum of the ranks of the negative 


ntl 
differences, T* — T^, or T = min(T^, T~). Since Т*+Т = "= {һе 


resulting tests are all equivalent. 
1 А is ; n(n + 1) 
Since T* and T~ both take on values on the interval from 0 to ТО 


2! PM: ^ n(n * 1) à 
and their distributions are symmetrical about mannm we can picture the 
relationship between the distributions of T*, T^, and T as in Figure 16.1 for n = 5. 
Regardless of the alternative hypothesis, we can base all tests of the null 
hypothesis ш = дє on the distribution of T, but we have to be careful to use the 
right statistic and the right critical value of T, as summarized in the following table: 


Alternative Reject the null 
hypothesis hypothesis if: 
Ш ш TST, 
H > Ho T = Ти 
и X po Яну Tus 
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0123 4 5 6 7 8 9 10 11 12 13 14 15 


Distribution of T+ or T - 


6/32 


4/32 


2/32 


257172 EE 8 
О 2:73 WT 52 1677 


Distribution of T 


Figure 16.1 Distributions of T *, T^, and T for n = 5. 


where the level of significance is o for each test. The critical values of T, which 
are such that T, is the largest value for which P(T < T,) does not exceed a, are 
given in Table IX at the end of the book. Note that the same critical values serve 


for tests at different levels of significance depending on whether the alternative 
hypothesis is one-sided or two-sided. 


EXAMPLE 16.4 


The following are fifteen measurements of the octane rating of a certain kind of 
gasoline: 97.5, 95.2, 97.3, 96.0, 96.8, 100.3, 97.4, 95.3, 93.2, 99.1, 96.1, 97.6, 98.2, 
98.5, and 94.9. Use the signed-rank test at the 0.05 level of significance to test 
whether or not the mean octane rating of the given kind of gasoline is 98.5. 


Sec. 16.3.: The Signed-Rank Test 527 


Solution 
1. Hy p = 98.5 
Hy p # 98.5 
2. Reject the null hypothesis if T = Tos, where Tos must be read from 
Table IX for the appropriate value of n. 


3. Subtracting 98.5 from each value and ranking the differences without 
regard to their sign, we get 


Measurement Difference Rank 
97.5 -1.0 4 
95.2 9:8 12 
97.3 12, 6 
96.0 22:9 10 
96.8 ml x 

100.3 1.8 8 
97.4 = 5 
95.3 32 11 
932 5913) 14 
99.1 0.6 2 
96.1 -24 9 
97.6 70:9 3 
98.2 -0.3 1 
98.5 0.0 
94.9 -3.6 13 


so that T = 4+12+6+10+7+5 +11 + 14+9+3+1+ 
13 = 95, Т =8 +2 = 10, and Т = 10. From Table IX we find that 
Tos = 21 for n = M. 

4. Since T = 10is less than Tos = 21, the null hypothesis must be rejected; 
the mean octane rating of the given kind of gasoline is not 98.5. A 


When we deal with paired data, the signed-rank test can also be used in 
place of the paired-sample sign test. In that case, we test the null hypothesis 
Ma = p using the test criteria given in the table on page 525, except that the 
alternative hypotheses are now ш # M2, Hi 7 Ho oru, < pa instead of ш # Ho, 
и > po, OF и < йо. 

For n > 15 it is considered reasonable to assume that the distribution of 

he signed-rank test based on this 


T* is approximately normal. To perform t 
assumption, we need the following results, which apply regardless of whether 


the null hypothesis is ш = Ho OT Hi = Mz: 
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OO HY 


THEOREM 16.1 The mean and the variance of T* are 


n(n + 1) 


E(T*) = ^ 


and 


n(n * 1)(2n 4 1) 


i MES 
var(T*) = 24 


-—————— 


Proof. Expressed in terms of ranks and signed differences, the null 
hypotheses for the one-sample and paired-sample signed-rank tests may be 
stated as follows: For each rank, the probabilities that it will be assigned 
to a positive difference or to a negative difference are both 3. Thus, we can 
write 


T'21:x *2:x i n: x 


п 


where X;, X», ..., and x, are independent random variables having the 
Bernoulli distribution with 0 = i. Since E(x;) = 0 = 1 and var(x;) = 
8(1— 0) = fori = 1,2,..., n by Theorem 52 with n = 1, it follows that 


BT) 91:321 *---* n] 


9 ЛЬ ДЬ. 
2 
п(п + 1) 
4 


Also, according to the corollary to Theorem 4.14 on page 167, we find that 


va(T') = 12.14 022.1... nl 


P+P +--+ + п? 
куы час RUDI 
4 


_ a(n + 1)(2n + 1) 
NORRIS TUN: тт 


We made use here of the familiar formulas for the sum and the sum 


of the squares of the first m positive integers, which are proved in 
Appendix II. M 


TAN 2 
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EXAMPLE 16.5 


The following are the weights in pounds, before and after, of 16 persons who 
stayed on a certain reducing diet for four weeks: 


Before After 
147.0 137.9 
183.5 1762 
232.1 219.0 
161.6 163.8 
197.5 193.5 
206.3 201.4 
177.0 180.6 
215.4 203.2 
147.7 149.0 
208.1 195.4 
166.8 158.5 
131.9 134.4 
150.3 149.3 
197.2 189.1 
159.8 159.1 
171.7 1732 


Use the signed-rank test to test at the 0.05 level of significance whether the 


weight-reducing diet is effective. 


Solution 


1. Н: p =p 

Hy d > M2 
2. Reject the null hypothesis if z > zos = 1.645, where 
Hates Е(Т*) 

Ууаг(Т*) 


ctive pairs are 9.1, 7.3, 13.1, —2.2, 4.0, 
2.5, 1.0, 8.1, 0.7, —1.5, and if their 
ositive differences occupy 


3. The differences between the геѕре 
4.9, —3.6, 12.2, 27130102: 9:3» ^5 
absolute values are ranked, we find that the p 
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ranks 13, 10, 16, 8, 9, 14, 15, 12, 2, 11, and 1. Thus, 


Th = 43 + 10-16 +8 +94 14,445 6:122 3-11 +1 
hel 
-17 16 +17 · 33 
Since Е(Т*) = £ = 68 and var(T*) = aT 374, we get 
Dd 111-68 _ ed 
У374 


4. Since z = 2.22 exceeds zo; = 1.645, the null hypothesis must be 
rejected; we conclude that the diet is,-indeed, effective in reducing 
weight. A 


164 RANK-SUM TESTS: THE U TEST 


In this section we shall present a nonparametric alternative to the two-sample t 
test, which is called the U test, the Wilcoxon test, or the Mann-Whitney test, 
named after the statisticians who contributed to its development. Without having 
to assume that the two populations sampled have normal distributions, we will 
be able to test the null hypothesis that we are sampling identical continuous 
populations against the alternative that the two populations have unequal means. 

To illustrate the procedure, suppose that we want to compare two kinds of 
emergency flares on the basis of the following burning times (rounded to the 
nearest tenth of a minute): 


Brand А: 14.9, 11.3, 13.2, 16.6, 17.0, 14.1, 15.4, 13.0, 16.9 
Brand B: 15.2,19.8, 14.7, 18.3, 16.2, 21.2, 18.9, 12.2, 15.3, 19.4 


Arranging these values jointly (as if they were one sample) in an increasing order 
of magnitude and assigning them in this order the ranks 1, 2, 3, ..., and 19, we 
find that the values of the first sample (Brand A) occupy ranks 1, 3, 4, 5, 7, 10, 
12, 13, and 14, while those of the second sample (Brand B) occupy ranks 2, 6, 
8, 9, 11, 15, 16, 17, 18, and 19. Had there been ties, we would have assigned to 
each of the tied observations the mean of the ranks which they jointly occupy. 

If there is an appreciable difference between the means of the two popula- 
tions, most of the lower ranks are likely to go to the values of one sample, while 
most of the higher ranks are likely to go to the values of the other sample. As 
originally proposed by Wilcoxon, the test is thus based on the value of W,, the 
sum of the ranks of the values of the first sample, or W, the sum of the ranks 
of the values of the second sample. It does not matter whether we choose W, or 
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W,, for if there are n, values in the first sample and n; values in the second 
sample, W, + W; is the sum of the first n, + n; positive integers; that is, 


W, + w, - (mt mn rmt 1) 


for any pair of values of W, and W). Thus, tests based on W, and W^; are 


equivalent. 
In actual practice, we seldom base tests on the statistics W, or W;; instead, 


we use the related statistics 


n(n + 1) 

or 
ЛЕЩ, 
y, = w,- Beet ) 


or the statistic min(U,, U2). The resulting tests are all equivalent to the ones based 
on W, or Wo, but they have the advantage that they lend themselves more readily 
to the construction of tables of critical values. As the reader will be asked to 
verify in Exercise 3 on page 536, the sum of the values of U, and Ч, is always 
nın, and both of these random variables take on the same range of values from 


] : ЕАМДЕ Э? ; nn 
0 to nın. Indeed, they have identical distributions, symmetrical about P 


Regardless of the alternative hypothesis, we can thus base all tests of the 
null hypothesis дү = #2 On the sampling distribution of U = min(U,, U2), but 
as on page 525 we have to be careful to use the right statistic and the right critical 
value of U, as summarized in the following table: 


Alternative Reject the null 
hypothesis hypothesis if: 
pa ul U < Ua 
pH U, € Ura 
Hi < M U, < Uza 


s a for each test. The critical values of U, which 
for which P(U < U,) does not exceed a, 
k. Note that, as in connection with 


where the level of significance i 
are such that U, is the largest value 
are given in Table X at the end of the boo 
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Table IX, the same critical values serve for tests at different levels of significance 
depending on whether the alternative hypothesis is one-sided or two-sided. 


EXAMPLE 16.6 


With reference to the data on page 530, test at the 0.05 level of significance 
whether the two samples come from identical continuous populations or whether 
the mean burning time of Brand A flares is less than that of Brand B flares. 


Solution 
l Hy i = ey 
Н: i< pp 


2. Since n, =.9 and 7» — 10, reject the null hypothesis if U, = 24, where 
24 is the corresponding value of Uio- 


3. Using the ranks obtained on page 530, we get 


W, 


Il 


VES HAES 7-104 12 13 4-14 
69 


Ji 
so that E ES 


4. Since U, = 24 equals Uo = 24, the null hypothesis must be rejected; 
we conclude that on the average Brand A flares have a shorter burning 
time than Brand B flares. A 


When n, and п› are both greater than 8, it is considered reasonable to 
assume that the distributions of U, and U, can be approximated closely by normal 
distributions. To perform the given rank-sum test оп the basis of this assumption, 
we need the following results: 


THEOREM 162 Under the null hypothesis, the means and the variances of 
U, and U, are 


E(U,) = E(U,) = Tony 
and 


var(U,) = var(U,) = и 


EXAMPLE 16.7 
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Proof. Under the null hypothesis that the two samples come from 
identical populations which are continuous (so that the probability is zero 
that there will be any ties), the random variable W, is the sum of n, positive 
integers selected at random from among the first n, + n; positive integers. 
Making use of the results of part (c) of Exercise 13 on page 282 with n = n, 
and № = n, + m, we thus find that 


E(W,) = n(n + n; + 1) 


2 
and 
_ mnm + n; + 1) 
var(W,) = ioa Data 
+1 
Since U, = М, - тит! it follows that 
_ m(n t+ m1). n(n + 1) _ тт 
BOWS 2 2 2 
and 


пуп(п + n; + 1) 

var(U,) = var(W,) = E ы МЫШ de co TES 
10 

Also, since U, + U, = nn; for any pair of values of U, and U, [see part 

(a) of Exercise 3 on page 536], we get 


E(U;) = nin; - E(U,) = 2 


and var(U,) = var(U,). M 


The following are the weight gains (in pounds) of two random samples of young 


turkeys fed two different diets b 


ut otherwise kept under identical conditions: 


Diet1: 16.3, 10.1, 10.7, 13.5, 14.9, 11.8, 14.3, 10.2, 12.0, 14.7, 23.6, 15.1, 14.5, 


Diet2: 21.3, 23.8, 15.4, 19.6, 12.0, 13.9, 


18.4, 13.2, 14.0 
18.8, 19.2, 15.3, 20.1, 14.8, 18.9, 20.7, 


21.1, 15.8, 16.2 
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Use the 0.01 level of significance to test the null hypothesis that the populations 
sampled аге identical against the alternative hypothesis that on the average the 
second diet produces a greater gain in weight. 


Solution 
(It does not matter in a problem like this whether we base the test on U, 
or О.) 
1. Hy à = He 
Н: ш <M 


3. Ranking the data jointly according to size, we find that the values of 
the first sample occupy ranks 21, 1, 3, 8, 15, 4, 11, 2, 5.5, 13, 31, 16,12, 
22, 7, and 10. (The fifth and sixth values are both 12.0, so we assigned 
each the rank 5.5.) Thus, 


W,=1424+34+44+55+74+84 104 11+ 12+ 13 
+ 15 + 16 + 21 + 22 + 31 


21815 
апа 
аа Е 
2 
= 45.5 
3 16 - 16 ties 

Since E(U;) = = 128 and var(U,) = BD = 704, 
we get 

45.5 — 128 

EE cro Edd 


X104 
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4. Sincez = —3.11 is less than —2.33, the null hypothesis must be rejected; 
we conclude that on the average the second diet produces a greater 
gain in weight. A 


165 RANK-SUM TESTS: THE H TEST 


The H test, or the Kruskal-Wallis test, is а generalization of the rank-sum test 
of the preceding section to the case where we test the null hypothesis that k 
samples come from identical continuous populations. In other words, it is à 
nonparametric alternative to the one-Way analysis of variance. 

As in the U test, the data are ranked jointly from low to high, as though 
they constitute one sample. Then, letting R; be the sum of the ranks of the values 
of the ith sample, we base the test on the statistic 


12 k Ri 
ey —-3ntl 
n(n + 1) L ni Md ) 


H= 


where n = n + n3 t'i Ple and k is the number of populations sampled. As 
it can be shown (see Exercise 7 on page 537) that the H statistic is proportional 


+1\? Ri. 
ences (5 кы ') ‚ where n is the 
ni i 


to a weighted mean of the squared differ 2 


n is the mean rank of all the 


mean rank of the values of the ith sample and 


data, it follows that the null hypothesis must be rejected for large values of H. 
For very small values of k and т, the test of the null hypothesis may be 
based on special tables (see references on page 552), but since the sampling 
distribution of H depends on the values of the ni it is impossible to tabulate it 
in a compact form. Hence, the test is usually based on the large-sample theory 
that the sampling distribution of H can be approximated closely with a chi-square 
distribution with k — 1 degrees of freedom. Proofs of this result may be found 
in the books on nonparametric statistics referred to on page 553, and they are 
based on the form of the H statistic as itis given in Exercise 7 on page 537. 


EXAMPLE 16.8 


The following are the final examination grades of samples from three groups of 


students who were taught German by three different methods (classroom instruc- 
d only self-study in 


tion and language laboratory, only classroom instruction, an 
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language laboratory): 


First method: 94, 88,91, 74, 87,97 
Second method: 85, 82, 79, 84, 61, 72, 80 
Third method: 89, 67, 72, 76, 69 


Use the H test:at the 0.05 level of significance to test the null hypothesis that 
the three methods are equally effective. 


Solution 
E 


4. 


Ho ga = ш = из 

Ay: Hi, ро, and и; are not all equal 

Reject the null hypothesis if H = 5.991, where 5.991 is the value of y^; ». 
Ranking the grades from 1 to 18, we find that R, = 6+ 13 + 14+ 16+ 
17+18=84, R,=1+45+8+9+10+11+ 12 = 55.5, and 
К, =2+3+ 4.5 +7 + 15 = 315, where there is one tie and the tied 
grades are each assigned the rank 4.5. Substituting the values of Rj, 
R, and К; together with n, = 6, n; = 7, n, = 5, and n = 18 into the 
formula for H, we get 


1235/8427 055.5? asy 
H= — + + -3- 
18 - il 6 7 5 IM 


6.67 


Since Н = 6.67 exceeds уо = 5.991, the null hypothesis must be 
rejected; we conclude that the three methods are not all equally 
effective. A 


THEORETICAL EXERCISES 


1. Show that under the null hypotheses of Section 16.3, the distribution of T^ 


is symmetrical about 


n(n * 1) 


2. With reference to the signed-rank test, find Е(Т* — T^) and var(T^ – Т7). 
3. Show that 
(a) ШО + U, = n,n, for any pair of values of U, and U;; 


(b) U, and U, both take on values on the range from 0 to nın. 


. Show that the distribution of W, is symmetrical about 


. Show that if a one-way analysis of varian 
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n(n, + n; + 1) 
te weg eT AS 


2 nd, 


hence, that the distribution of U, is symmetrical about e (Hint: Rank the 


combined data in an increasing as well аѕ,а decreasing order of magnitude.) 


. Verify that U, and О, are also given by 


U, = nnm + d Ша W: 
and 
U, = nm эзен = М, 
‚ If Xi, Xo, ees X4, and yy, Yas «++ У», аге independent random samples, we 


can test thé null hypothesis that they come from identical continuous popula- 
tions on the basis of the Mann-Whitney statistic U, which is simply the 
number of pairs (x; уу) for which x; > уу. Symbolically, 


о-у Xd; 
і=1 ј=1 
where 
TUS. |, ifx, > у 
БОДО -ifx «y» 
fori-1,2,...,n, and j = 1,2, ..., n» Making use of the fact that 
n 
È d;=r -m 
j=1 


where r; is the rank of x; and m, is the number of x's that are less than or 
equal to ху, show that U is the same as the U, statistic of Section 16.4. 


. Verify that the Kruskal- Wallis statistic on page 535 is equivalent to 


moo [а п + | 
s eun pe „чн ыа 
Н n(n*1) i ‘Ln, 2 


ce is performed on the ranks of the 


observations instead of the observations, themselves, it becomes equivalent 


to a test based on the H statistic. 
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APPLIED EXERCISES 


9. 


10. 


11. 
12. 


13. 


14. 


15. 


16. 


The following аге the amounts of time, in minutes, which it took a random 
sample of 20 technicians to perform a certain task: 


18.1 20.3 18.3 15.6 22.5 168 17.6 169 182 17.0 
19.3 16.5 19.5 18.6 20.0 188 191 17.5 185 18.0 


Use the sign test at the 0.05 level of significance to test the null hypothesis 
that these measurements constitute a random sample from a continuous 
population with the mean и. = 19.4 minutes against the two-sided alternative 
ш # 19.4 minutes. Base the test on Table I. 


Rework the preceding exercise using the normal approximation to the 
binomial distribution. 


Rework Exercise 9 using the signed-rank test based on Table IX. 


The following are the amounts of money (in dollars) spent by 16 persons at 
a certain amusement park: 10.15, 9.85, 13.75, 8.63, 11.09, 15.63, 6.65, 9.27, 
8.80, 11.45, 10.29, 9.51, 13.80, 10.00, 7.48, and 9.11. Use the sign test at the 
0.05 level of significance to test the null hypothesis that on the average a 
person spends $9.00 at the park against the alternative that this figure is too 
low. Base the test on Table I. 


Rework the preceding exercise using the signed-rank test based on the normal 
approximation to the distribution of the test statistic. 


The following are the numbers of speeding tickets issued by two policemen 
on a random sample of 30 days: 7 and 10, 11 and 13, 10 and 11, 14 and 14, 
11 and 15, 12 and 9, 6 and 10, 9 and 13, 8 and 11, 10 and 11, 11 and 15, 13 
and 11, 7 and 10, 6 and 12, 10 and 14, 8 and 8, 11 and 12, 9 and 14, 9 and 
7, 10 and 12, 6 and 7, 12 and 14, 9 and 11, 12 and 10, 11 and 13, 12 and 15, 
7 and 9, 10 and 9, 11 and 13, and 8 and 10. Use the sign test at the 0.05 level 
of significance to test the null hypothesis that on the average the two policemen 
issue equally many speeding tickets against the alternative hypothesis that 
on the average the second policeman issues more speeding tickets than the 
first. 


The following are the numbers of employees absent from the two subsidiaries 
of a large firm on 16 days: 36 and 23, 20 and 12, 10 and 13, 15 and 11, 33 
and 26, 23 and 21, 18 and 23, 25 and 24, 17 and 10, 24 and 34, 30 and 21, 
18 and 24, 25 and 25, 19 and 14, 22 and 11, 28 and 16. Use the sign test at 
the 0.05 level of significance to test the null hypothesis that on the average 
there are equally many absences in the two subsidiaries against the alternative 
that on the average there are more absences in the first subsidiary. Base the 
test on Table I. 


Use the signed-rank test based on Table IX to rework the preceding exercise. 


17. 


18. 


19. 


20. 


21. 
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Rework Exercise 15 using the signed-rank test based on the normal approxi- 
mation to the distribution of the test statistic. 


The following are figures on the numbers of burglaries committed in a city 
in random samples of six days in the spring and six days in the fall: 


Spring: 36, 25, 32, 38, 28, 35 « 
Fall: 27,20,15,29,18,22 


Use the U test at the 0.01 level of significance to test the claim that on the 
average there are equally many burglaries per day in the spring as in the fall 
against the alternative that there are fewer in the fall. 

The following are the Rockwell hardness numbers obtained for six aluminum 
die castings randomly selected from production lot A and eight from produc- 
tion lot B: 


Production lot A: 75, 56, 63, 70, 58, 74 
Production lot В: 63, 85, 77, 80, 86, 76, 72, 82 


Use the U test at the 0.05 level of significance to test whether the castings 
of production lot B are on the average equally hard or whether they are 
harder than those of production lot A. 

The following are the numbers of minutes it took random samples of 15 men 
and 12 women to complete a written test given for the renewal of their driver's 


licences: 


Men: 9.9,7.4,8.9,9.1, 7.7,9.7, 11.8, 9.2, 10.0, 10.2, 9.5, 10.8, 8.0, 11.0, 7.5 
Women: 8.6, 10.9, 9.8, 10.7, 9.4, 10.3, 7.3, 11.5, 7.6, 9.3, 8.8, 9.6 


Use the U test based on Table X at the 0.05 level of significance to decide 
whether to accept the null hypothesis ш, = #2 OF the alternative hypothesis 
pa * шо, Where py and и; are the average amounts of time it takes men and 
women to complete the test. 

Rework the preceding exercise using the nor 
bution of the test statistic. 


An examination designed to measure basic knowledge of American history 
m samples of freshmen аї two major universities, and 


mal approximation to the distri- 


was given to rando 
their grades were 


University А: 17, 72, 58, 92, 87, 93, 97, 91, 70, 98, 
76, 90, 62, 69, 90, 78, 96, 84, 73, 80 


University B: 89,74, 45, 56, 71, 74, 94, 88, 66, 62, 
88, 63, 88, 37, 63, 75, 78, 34, 75, 68 
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24. 


26. 


27. 


Use the U test at the 0.05 level of significance to test the null hypothesis that 
there is no difference in the average knowledge of American history between 
freshmen entering the two universities. 


The following are data on the breaking strength (in pounds) of random 
samples of two kinds of 2-inch cotton ribbons: 


Type I ribbon: 144, 181, 200, 187, 169, 171, 186, 194, 
176, 182, 133, 183, 197, 165, 180, 198 


Type II ribbon: 175, 164, 172, 194, 176, 198, 154, 134, 
169, 164, 185, 159, 161, 189, 170, 164 


Use the U test at the 0.05 level of significance to test the claim that Type I 
ribbon is on the average stronger than Type II ribbon. 


With reference to the data on page 530 and Example 16.6 on page 532, 
calculate the value of the Mann-Whitney U statistic (see Exercise 6) and 
verify that it equals the value obtained for U,. 


. With reference to Exercise 19, calculate the value of the Mann-Whitney U 


statistic (see Exercise 6) and verify that it equals the value obtained for U,. 


To compare four bowling balls, a professional bowler bowls five games with 
each ball and gets the following results: 


Ball D: 208, 220, 247, 192, 229 
Ball E: 216,196, 189, 205, 210 
BallF: 226, 218, 252, 225, 202 
Ball G: 212, 198, 207, 232, 221 


Use the Kruskal- Wallis test at the 0.05 level of significance to test whether 
or not the bowler can expect to score equally well with the four bowling balls. 


The following are the miles per gallon which a test driver got for ten tankfuls 
of each of three kinds of gasoline: 


Gasoline A: 20, 31, 24, 33, 23, 24, 28, 16, 19, 26 
Gasoline В: 29,18,29, 19, 20, 21, 34, 33, 30, 23 
Gasoline С: 19, 31, 16, 26, 31, 33, 28, 28, 25, 30 


Use the Kruskal- Wallis test at the 0.05 level of significance to test whether 
or not there is a difference in the actual average mileage yield of the three 
kinds of gasoline. 
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166 TESTS BASED ON RUNS 


There are several nonparametric methods for testing the randomness of observed 
data on the basis of the order in which they were obtained. The technique we 
shall describe here is based on the theory of runs, where a run is a succession of 
identical letters (or other kinds of symbols) which is preceded and followed by 
different letters or no letters at all. To illustrate, consider the following arrange- 
ment of defective, d, and nondefective, n, pieces produced in the given order by 
a certain machine: 


nnnnnddddnnnnunnnnnddnnddddnddnn 


Using braces to combine the letters which constitute a run, we find that there is 
first a run of five n's, then a run of four d's, then a run of ten n's, .. . , and finally 
a run of two n’s; in all, there are nine runs of varying lengths. 

The total number of runs appearing in an arrangement of this kind is often 
a good indication of a possible lack of randomness. If there are too few runs, 
we might suspect a definite grouping or clustering, or perhaps a trend; if there 
are too many runs, we might suspect some sort of repeated alternating pattern. 
In our illustration there seems to be a definite clustering, the defective pieces 
seem to come in groups, but it remains to be seen whether this is significant or 
whether it can be attributed to chance. 

To find the probability that п, letters of one kind and п, letters of another 


+ 
kind will form u runs when each of the К 2 ip possible arrangements of these 
1 


letters is regarded as equally likely, let us first investigate the case where и is 
even, namely, where u — 2k and k is a positive integer. In that case there will 
have to be К runs of each kind alternating with one another. To find the number 
of ways in which n, letters can form К runs, let us first consider the very simple 
case where we have five letters c which are to be divided up into three runs. 
Using vertical bars to separate the five letters into three runs, we find that there 


are the six possibilities 
c|e| ccc c |ec |ec с|ссс|с 


сс|с|сс сс|сс|с ccc |с [с 


corresponding to the () ways in which we can put two vertical bars into two 


n-1 
of the four spaces between the five c's. By the same token there are ( je Ы 


: п A 
ways in which the n, letters of the first kind can form k runs, in М; |) ways in 
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which the m letters of the second kind can form k runs, and it follows that there 
=] -1 
are altogether d ^ s i ') ways in which these n, + n; letters can form 
2k runs. The factor 2 is accounted for by the fact that when we combine the two 
kinds of runs so that they alternate, we can begin either with a run of the first 
kind of letter or with a run of the second kind. Thus, when и = 2k (where k is 
a positive integer), the probability of getting that many runs is 
nm-1Yn-1 
(Gu) 
k-1/\k-1 
fw) = (^ + " 
nm 
and it will be left to the reader to show in Exercise 1 on page 548 that similar 
arguments lead to 
n -1Yn-1 m -1Ym-1 
k Ax-1J * Ak-13/N. к 
AL ae А 
fo (^ + m) 
n 

when и = 2k + 1 (where k is a positive integer). 

When n, and n; are small, tests of randomness based on u are usually 
performed with the use of special tables such as Table XI at the end of the book. 
We reject the null hypothesis of randomness at the level of significance a if 

ижи, ог ийи 
where u/,;; is the largest value for which P(u = и!, 5) does not exceed a/2 and 
и„ у; is the smallest value for which P(u > ua/2) does not exceed a /2. 
EXAMPLE 16.9 


Checking on elm trees that were planted many years ago along à country road, 
у official obtained the following arrangement of healthy, H, and diseased, 
) trees: 


HHHHDDDHHHHHHHDDHHDDDD 


Test at the 0.05 level of significance whether this arrangement may be regarded 
as random. 
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Solution 

1. Ho: Arrangement is random. 
H,: Arrangement is not random. 

2. Since n, = 13and m = 9, reject the null hypothesis ifu © боги > 17, 
where 6 and 17 are the corresponding values of u^, and uoz. 

3. u = 6 by inspection of the data. 

4. Since и = 6 is less than or equal to 6, the null hypothesis must be 
rejected; the arrangement of healthy and diseased elm trees is not 
random. It appears that the diseased trees come in clusters. А 


When n, and n, are both greater than or equal to 10, it is considered 
reasonable to assume that the distribution of u can be approximated closely with 
a normal curve. To perform the runs test on the basis of this assumption, we 
need the following results: 


THEOREM 163 Under the null hypothesis of randomness, the mean and 
the variance of u are 


2n,n; 
mtn 


E(u) = *1 


and 


маци) = 2f inm A) : 


(m + mm * m = ! 


These results can be obtained directly with the use of the probabilities given on 
page 542. The details of such a proof, as well as an alternative approach which 
is easier, may be found in the book by J. D. Gibbons listed on page 552, 


EXAMPLE 16.10 


With reference to the illustration on page 541, the one dealing with an arrangement 
of defective and nondefective pieces produced by a certain machine, test for 
randomness at the 0.01 level of significance. 


Solution 


1. Ho: Arrangement is random. 
Ну: Arrangement is not random. 
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2. Reject the null hypothesis if z « —2.575 or z > 2.575, where 


(и + 3) – E(u) 


v var(u) 


First substituting n, = 20 and n; = 12 into the formulas for E(u) and 
var(u), we get 


2:20:12 


E 1-16 
Eu) ^59 E12 


and 


2 20:012(7«20:712 —20 — 12) 


3 = 6.77 
(20 + 12)? (20 + 12 - 1) $ 


var(u) = 


Then, substituting these values together with u — 9 into the formula 
for z, we arrive at the result that 


TEMOR 1) - 16 
v6.77 


Since z = —2.50 falls between —2.575 and 2.575, the null hypothesis 
cannot be rejected; in other words, there is no real indication of any 
lack of randomness. A 


= —2.50 


Note that if we had not used the continuity correction in the preceding 
example, we would have obtained z — —2.69 and the decision would have been 


The method we have discussed in this section is not limited to tests of the 
randomness of series of attributes (such as the d's and n's of our example). Any 
sample which consists of numerical measurements or observaticns can be treated 
similarly by using the letters a and b to denote, respectively, values falling above 
and below the median of the sample. (Numbers equaling the median are omitted.) 
The resulting series of a's and b’s can then be tested for randomness on the basis 


of the total number of runs of a's and b's, namely, the total number of runs above 
and below the median. 


The following are the speeds (in miles per hour) at which every fifth passenger 
car was timed at a certain checkpoint: 46, 58, 60, 56, 70, 66, 48, 54, 62, 41, 39, 
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| 52, 45, 62, 53, 69, 65, 65, 67, 76, 52, 52, 59, 59, 67, 51, 46, 61, 40, 43, 42, 77, 67, 
63, 59, 63, 63, 72, 57, 59, 42, 56, 47, 62, 67, 70, 63, 66, 69, and 73. Test the null 
hypothesis of randomness at the 0.05 level of significance. 


Solution 


1. Но The sample is random. 
H,: The sample is not random. 
2. Reject the null hypothesis if z < —1.96 or z > 1.96, where 


(и + 3) – E(u) 
Vvar(u) 


and u is the number of runs above and below the median. 
3. Since the median of the speeds is 59.5, we get the following arrangement 
of a’s and b’s: 


bbabaabbabbbbabaaaaabbbba 
bbabbbaaabaaabbbbbaaaaaaa 


Then, since n, = 25, n; = 25, and u — 20, we get 


2.25.25 
S PALM ME 
Eu) = iras % 
2-25 -25(2 + 25-25 — 25 - 25) 
_ r 25(2' 2529 ees 120 
ханш (25 + 25) Q5 + 25 = 1) 


and 


(20 + 3) — 26 
= М = -1.57 
: 122 
4. Sincez = —1.57 falls between —1.96 and 1.96, the null hypothesis cannot 
be rejected; there is no real evidence that the sample should not be 


regarded as random. A 


16.7 THE RANK CORRELATION COEFFICIENT 


significance test for the correlation 


Since the assumptions underlying the | 
mes preferable to use a nonparametric 


coefficient are rather stringent, it is someti: 
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alternative. Most popular among the nonparametric measures of association is 
the rank correlation coefficient, also called Spearman's rank correlation coefficient, 
rs. For a given set of paired data {(x,, у); i = 1,2,..., п), it is obtained by 
ranking the x's among themselves, and also the y's, both from low to high or 
from high to low, and then substituting into the following formula. 


DEFINITION 16.1 The rank correlation coefficient is given by 


6.Xxd 


і=1 


ES) 


where d; is the difference between the ranks assigned to x; and y;. 


When there are ties in rank, we proceed as before and assign the tied observations 
the mean of the ranks which they jointly occupy. 

When there are no ties in rank, г; actually equals the correlation coefficient 
r calculated for the ranks. To verify this, let r; and s, be the ranks of x, and у. 
Making use of the fact that the sum and the sum of the squares of the first 7 


n(n + 1) n(n + 1)(2n + 1) 


positive integers are NU RU. and 6 , respectively, we find that 


уо ааа 
i=1 


i=] 


ip te eee tad dO 
AES 6 2 


„ўа 


ist 


and if we substitute these expressions into the formula for r, we get the above 
formula for rs. 


EXAMPLE 16.12 


The following are the numbers of hours which ten students studied for an 
examination and the grades which they obtained: 
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Number of 
hours studied Grade 
x y 
8 56 
5 44 
11 79 
13 72 
10 70 
5 54 
18 94 
15 85 
2 33 
8 65 


Calculate rs. 


Solution 
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Ranking the x's and the y's, and proceeding as in the following table, we get 


Rank Rank 
of x of y d а? 
6.5 7 —0.5 0.25 
8.5 9 -0.5 0.25 
4 £] 1.0 1.00 
3 4 1.0 1.00 
5 5 0.0 0.00 
8.5 8 0.5 0.25 
1 1 0.0 0.00 
2 2 0.0 0.00 
10 10 0.0 0.00 
6.5 6 0.5 0.25 
3.00 


and substitution into the formula for rs yields 


6:3 
EC еле TOES 
rs TOGO i) 


As can be seen from the preceding example, 
indeed, it is sometimes used instead of r mainly b 
ease. If we were to calculate r for 
get г = 0.96, and this is very close to rs = 0.98. 


rs is very easy to compute; 
ecause of its computational 
the data of the preceding example, we would 
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For small values of n (n « 10), the test of the null hypothesis of no 
correlation, indeed, the test of the null hypothesis that the x's and y’s are randomly 
matched, may be based on special tables determined from the exact sampling 
distribution of rs (see references on page 552). Most of the time, though, we use 
the fact that the distribution of rs can be approximated closely with a normal 
distribution, and to this end we need the following results: 


THEOREM 164 Under the null hypothesis of no correlation, the mean and 
the variance of г; are 


E(rs) = 0 and var(r,) = тый 
Жей! 


A proof of this theorem may be found in the book by J. D. Gibbons referred 
to on page 552. Strictly speaking, the theorem applies only when there are no 


ties, but the result can be used as an approximation unless the number of ties is 
large. 


EXAMPLE 16.13 


With reference to Example 16.12, test at the 0.01 level of significance whether 
the value obtained for the rank correlation coefficient, rs = 0.98, is significant. 
Solution 


1. Ho: There is no correlation. 
Ну: There is a correlation. 


2. Reject the null hypothesis if z < —2.575 or z > 2.575, where 


P3 p 
3. Substituting n = 10 and rs = 0.98, we get 
z = 098/10 — 1 = 2.94 
4. Sincez = 2.94 exceeds 2.575, the null hypothesis must be rejected; we 


conclude that there is a real (positive) relationship between study time 
and grades. A 


THEORETICAL EXERCISES 


1. Verify the formula given on page 542 for the values of the probability 
distribution of и when и = 2k + 1, where k isa positive integer. 
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2. If a person gets 7 heads and 3 tails in 10 tosses of a balanced coin, find the 
probabilities for 2, 3, 4, 5, 6, and 7 runs. 

3. Find the probability that n, = 6 letters of one kind and m = 5 letters of 
another kind will form at least 8 runs. 

4. If there are n, = 8 letters of one kind and n; = 8 letters of another kind, 
what are the values of u for which we would reject the null hypothesis of 
randomness at the 0.01 level of significance? 

5. Given a set of k-tuples (xi Xi -+ -> Xik) (25 Xo +- „ХУ еу, and 
(ха, Хз». <<, Хк), the extent of their association, or agreement, may be 
measured by means of the coefficient of concordance 


12 E _ k(n +1]? 
[^ 2 | 


where R, is the sum of the ranks assigned to Xi, Хо, «++ > and x, when the 
x's with the second subscript 1 are ranked among themselves, and so are the 
x's with the second subscript 2, ..., and the x's with the second subscript 
k. What are the maximum and minimum values of W, and what do they 
reflect with respect to the agreement, or lack of agreement, of the values of 


the k random variables? 


APPLIED EXERCISES 


6. The following is the order in which a broker received buy, В, and sell, S, 
orders for a certain stock: 


BBBBBBBBSSBSSSSSSBBBBB 


Test for randomness at the 0.05 level of significance. 


7. A driver buys gasoline either at à Texaco station, T, or at a Mobil station, 
M, and the following arrangement shows the order of the stations from which 


she bought gasoline over a certain period of time: 


TTTMTMTMMTTMTM TMTMMTMT 


Test for randomness at the 0.05 level of significance. 


8. The following is the order in which red, R, and black, B, cards were dealt 


to a bridge player: 


BBBRRRRRBBRRR 


Test for randomness at the 0.05 level of significance. 
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10. 


11. 


12. 


13. 


15. 


The following arrangement indicates whether 60 consecutive cars which went 
by the toll booth of a bridge had local plates, L, or out-of-state plates, O: 


LLOLLLLOOLLLLOLOOLLLLOLOOLLLLL 
(cott) OLLLOLOLLLLOOLOOOOLLLLOLOOLLLO 


Test at the 0.05 level of significance whether this arrangement of L's and O's 
may be regarded as random. 


To test whether a radio signal contains a message or constitutes random 
noise, an interval of time is subdivided into a number of very short intervals 
and for each of these it is determined whether the signal strength exceeds, 
E, or does not exceed, N, a certain level of background noise. Test at the 
0.01 level of significance whether the following arrangement, thus obtained, 
may be regarded as random, and hence that the signal does not contain a 
message: 


NNNENENENEEENEEENEENENEE 
(cont.) NEENNENEEENENNNENNENNNNE 


Flip a coin 100 times and test at the 0.01 level of significance whether the 
resulting sequence of H's and T's (heads and tails) may be regarded as 
random. 

The following are the numbers of students absent from school on 24 consecu- 
tive school days: 29, 25, 31, 28, 30, 28, 33, 31, 35, 29, 31, 33, 35, 28, 36, 30, 
33, 26, 30, 28, 32, 31, 38, and 27. Test for randomness at the 0.01 level of 
significance. 


The following are the numbers of defective pieces produced by a machine 
on fifty consecutive days: 7, 14, 17, 10, 18, 19, 23, 19, 14, 10, 12, 18, 19, 13, 
24, 26, 9, 16, 19, 14, 19, 10, 15, 22, 25, 24, 20, 9, 17, 28, 29, 19, 25, 23, 24, 
28, 31, 19, 24, 30, 27, 24, 39, 35, 23, 26, 28, 31, 37, and 40. Test at the 0.025 
level of significance whether there might be a trend. 


‚ The following are the numbers of lunches that an insurance agent claimed 


as business deductions in 30 consecutive months: 6, 7, 5, 6, 8, 6, 8, 6, 6, 4, 
3, 2, 4, 4, 3, 4, 7, 5, 6, 8,6, 6, 3, 4, 2, 5, 4, 4, 3, and 7. Use the runs test based 
on Table XI to test for randomness at the 0.01 level of significance. 


The theory of runs may also be used as an alternative to the rank-sum test 
of Section 16.4, namely, the test of the null hypothesis that two independent 
random samples come from identical continuous populations. We simply 
rank the data jointly, write a 1 below each value belonging to the first sample, 
а 2 below each value belonging to the second sample, and then test the 
randomness of the resulting arrangement of 1’s and 2's, If there are too few 
runs, this may well be accounted for by the fact that the two samples come 
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from populations with unequal means. With reference to the data on page 
530, use this technique to test at the 0.05 level of significance whether or not 
the two samples come from identical continuous populations. 


16. Calculate rs for the following data representing the statistics grades, x, and 
psychology grades, y, of 18 students: 


x y x y 
78 80 97 90 
86 74 74 85 
49 63 53 71 
94 85 58 67 
53 55 62 64 
89 86 74 69 
94 90 74 71 
71 84 70 67 
70 71 74 71 


17. With reference to the preceding exercise, test at the 0.05 level of significance 
whether the value obtained for rs is significant. 


18. The following shows how a panel of nutrition experts and a panel of house- 
wives ranked fifteen breakfast foods on their palatability: 


Breakfast Nutrition 
food experts Housewives 
T ORE а Е Аата 
А 3 5 
B 7 4 
с 11 8 
р 9 14 
Е 1 2 
Е 4 6 
G 10 12 
H 8 7 
I 5 1 
J 13 15 
K 12 9 
L 2 3 
M 15 10 
N 6 11 
(9) 14 13 


| 


Calculate rs as а measure of the consistency of the two rankings. 
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19. Calculate rs for the data of Exercise 16 on page 460 and test the null hypothesis 
of no correlation at the 0.05 level of significance. 


20. The following are the rankings given by three judges to the works of ten artists: 


Judge A Judge B Judge C 


ta O0 'O & OV © I9 — Wr 


HO 0 — о ot 1450 
оо оон © o b Uu tS 


^ Calculate the value of W, the coefficient of concordance of Exercise 5, as a 
measure of the agreement of the three sets of rankings. 


21. With reference to the preceding exercise, calculate the k — 3 pairwise rank 


correlation coefficients, and verify that the relationship between their mean, 
Fs, and the coefficient of concordance (see Exercise 5) is given by 


kW -1 
oper 
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The Algebra of Events 


11 BOOLEAN ALGEBRA 


Since events are subsets of a sample space according to the definition on page 
30, the subject matter of this appendix is the algebra of sets, with which the 
reader is probably familiar from more elementary work in mathematics. It is 


given here mainly for review. : 
Given two events А and B in a sample space $, we can obtain further events 
by forming unions, intersections, and complements. 


DEFINITION 11 The union of events А and B, denoted A U B, is the event 
in S which contains all the elements that are either in A, in B, or in both. 


DEFINITION 12 The intersection of events А and B, denoted А ^ B, is the 
event in S which contains all the elements that are both in A and in B. 


DEFINITION 13 The complement of event A, denoted A’, is the event in S 
which contains all the elements of S that are not in A. 


554 
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The formation of unions, intersections, and complements is governed by 
the following rules, called the postulates of Boolean Algebra: 


POSTULATE 11 (Closure laws) For each pair of events A and B in a sample 
space S there is a unique event А о B and a unique event А г B in S. 


POSTULATE L2. (Commutative laws) 


AUB-BuoA and An B- Bn A 


POSTULATE L3 (Associative laws) 


(AU B)UC=AU(BUC) and (An В) C- An (Bo C) 


POSTULATE 14 (Distributive laws) 
Ап (Во С) = (А г В) о (Ас С) 


апа 


A u (B С) = (Au B) n (Au C) 


POSTULATE L5 (Identity laws) А ^ S = A foreach event А in the sample 
space S; also, there exists a unique event Ø such that A о Ø = A for each 


event A in S. 


POSTULATE 16 (Complementation law) For each event Ainasample space 
S there exists a unique event A’in S such that A ^ А' = QandA о А’ = S. 


Observe that Postulate 1.5 defines, in fact, what we mean by the empty set @, 
and that Postulate 1.6 defines what we mean by the complement А' of event A. 
Based on these postulates, we can prove many further theorems about the 


manipulation of events. For instance, 


THEOREM 11 А ^ А = A for any event A їп a sample spdce S. 
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Proof. 
A-AnS Postulate I.5 
= Ас (Ах А) Postulate 1.6 
= (Ас А) о (А с A) Postulate 1.4 
= (Ас А) GO Postulate 1.6 
= АпА Postulate 1.5 M 


The following are some other theorems which can be proved in a similar way: 
AU А= A, (A) = А, AU S = S and A ^ Ø = Ø for any event A in the 
sample space S, 5' = Ø, and @' = 8. 

In most instances it is easiest to verify rules about events by inspection of 
appropriate Venn diagrams. In this way it can be shown, for example, that the 
second distributive law of Postulate 1.4 does, indeed, hold when we represent 
events by means of regions of Venn diagrams. In the first Venn diagram of Figure 
L1, A is ruled horizontally, B ^ C is ruled vertically, and AU (BO C) is 
represented by the region ruled horizontally and/or vertically; in the second Venn 
diagram of Figure L1, AU В is ruled horizontally, А о C is ruled vertically, 
and (A u B) ^ (A u C) is ruled both ways. As can be seen, the region ruled 
horizontally and/or vertically in the first diagram is the same as that ruled both 
ways in the second diagram. 


AU(BNC) (AUBIN (AUC) 


4 
1 
1 


Figure 1.1 Venn diagrams showing that Ао (Bo C)-(Au В) п (Au C). 


THEORETICAL 


16 
2: 


w 


я 
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EXERCISES 
Use Venn diagrams to verify the first distributive law of Postulate 1.4. 


Use Venn diagrams to verify the two de Morgan laws 
(a (Ак B) = A'u B^ 
(b (Au B) = A' n В. 


. Use Venn diagrams to verify that 


(a) Ао (А п В) = А; 

(b) Ап (Ао В) = А; 

(c) (Ас В) о (An В) = А; 

(d Ам В = (A n B) о (Ao В) о (А' ES 

(е) Ао (А' п В) = Ао В. 

Use Venn diagrams to verify that if A © В (namely, A is contained іп B), 
then A^ B = A and An B' = Ø. 

Prove that A u A = A for any event A in a sample space $, justifying each 
step by means of one of the postulates. 


. Prove that A ^ Ø = Ø for any event A in a sample space S, justifying each 


step by means of one of the postulates. 


Sums and Products 


31 RULES FOR SUMS AND PRODUCTS 


To simplify expressions involving sums and products, the У and |] notations are 
widely used in statistics. In the usual notation we write 


b 
o3 iX. Кш t Kava + oos хь 


i-a 


and 


b 
Dip he АЫ Kassie xp 


for any non-negative integers a and b with a < b. 

When working with sums or products, it is often helpful to apply the 
following rules, which can all be verified by writing the respective expressions 
in full, that is, without the У or [| notation: 
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THEOREM П.1 


1 i kx = К: È x 
2 x к = nk 
3. = (x; + у) = у xc i» 
4 П kx, = К" П Xi 
5. Пк=к" 
i=l = 


., are also widely used in statistics, and if we 


Double sums, triple sums, . . 
for example, 


repeatedly apply the definition of У, given above, we have, 


mon m 
Sp eee Y (xa + хе ++ Hin) 


i=1 j=) i-i 


(ОЕ en Xi) 


NW 


+ (xp + X22 + °° + Xan) 


+ (хи + Xm2 +° + Xn) 


r array, the first subscript 


rranged in a rectangula: 
and the second subscript 


Note that when the x; are thus à 


denotes the row to which a particular element belongs, 
denotes the column. el 

When we work with double sums, the following theorem is of special interest; 
it is an immediate consequence of the multinomial expansion of (x; + x allel als 


aces 
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THEOREM 11.2 


where 


1.2 SPECIAL SUMS 


In the theory of nonparametric statistics, particularly when we deal with rank 
sums, we often need expressions for the sums of powers of the first n positive 
integers; namely, expressions for 


БО) ОЗЕ ack int 
for r = 0, 1, 2,3, .... The following theorem, which the reader will be asked to 


prove in Exercise 1 below, provides a convenient way of obtaining these sums: 


THEOREM I13 


k~i 


» (se. г) =(п+1)®—-1 


r=0 


for any positive integers n and k. 


A disadvantage of this theorem is that we have to find the sums S(n, r) one at à 
time, first for г = 0, then for г = 1, then for r = 2, and so forth. For instance, 
for К = 1 we get 


1 
(seo =(п+1)-1=п 


Sec. Il.2.: Special Sums ^ 561 


and, hence, S(n,0) = 1° + 2° +... + п? = п. Similarly, for К = 2 we get 


(s. " (se. = (ЕЛ Т 


n + 2S(n,1) = п? + 2n 
and, hence, S(n, 1) = 1' + оф... + п! =!n(n + 1). Using the same tech- 
nique, the reader will be asked to show in Exercise 2 below that 


S(n,2) = 1п(п +1)(2п +1) and S(n,3) = In(n +1)? 


THEORETICAL EXERCISES 
1. Prove Theorem 11.3 by making use of the fact that 


(т + 1) = т‘ = У ()" 


r=0 


which follows from the binomial expansion of (m + DS 


2. Verify the formulas for S(n, 2) and S(n, 3) given above, and find an expression 


for S(n, 4). 


Statistical Tables 


уш. 


XI. 


56? 


Binomial Probabilities 

Poisson Probabilities 

Standard Normal Distribution 
Values of ta,» 

Values of Xa 

Values of F озь», and F о, 
Factorials and Binomial Coefficients 
Values of e* and e * 
Critical Values of T 


Critical Values of U 


Critical Values of u 


TABLE ! 


Binomial Probabilities’ 


DAC 65 q9* БУА 200 лой ПОР Азан An diio 
ing od .9000 .8500 .8000 .7500 .7000 .6500 .6000 .5500 .5000 
10500 1000 .1500 .2000 .2500 .3000 .3500 .4000 .4500 5000 
2 о. | .0025..8100 .7225 .0400 .5025 .4900 .4225 3600 .3025 .2500 
1 | “ообо .1800, .2550 .3200 .3750 .4200 .4550 .1800 14950 .5000 
"0920 70100 .0225 .0400 .0625 .0900 .1225 .1600  .2025 2500 
3 o |.8574 .7290 .0141 .5120 .4219 .3430 .2746 .2160 .1004 .1250 
1 1354 2430 3251: .3840 .4219 .4410 .4436 1.4320 (4084 .3750 
2 боту 0270 .0574 .0900 1406 1800 .2389 .2880 13341 .3750 
4 | "oooi .0010 .0034 .0080 .0156 .0270 -0428 10640 .0911 .1250 
4 0 alis enel 18220) 4096. 316%, MOT Co Eran undas .0915  .0625 
1 i5 .2916 .3685 .4096 4219 -4116 -3545 3456 .2995 .2500 
2 o135 0486 .0975 .1536 .2109 .2646 3105 3456 .3075 .3750 
2 | looos; 0086, .0115. 0256 0469 0008. HIIS 1536 .2005 .2500 
з | 10000 0001 .0005 .0016 .0039 .0081 .0150 10256 .0410 .0625 
5 0 aap, .5908. 4497 9927. 2819. 216810 1100 .0778 .0503 .0312 
1 | 72036 .3280 .3915 .4096 .3955 .3602 [3124 .2592 .2059 1562 
2 0214 "0729 1382 .2048 .2637 .3087 3304 3456 .3360 .3125 
2 | бот 0081 0244. :0512 .0870. 1878 їзїї .2304 .2757 .8125 
4 | 10000 .0004 .0022 .0064 .0146 .0284 ‘0488 .0768 .1128 .1562 
s | .0000 .0000 .0001 .0003 .0010 005% .0053 .0102 .0185 .0312 
| 

є в | 282.7 oaii a асаа ME 1176 .0754 .0407 ..0277 0180 
О зз} 8648 13993 8022 13580 09007 72437 .1800 .1359 .0938 
l | 50308. баш. 1762. 02488) 7066 Изи '3280 .3110 .2780 .2344 
3 | .0021 .0146 .0415 .0819 .1918 "1852 .2355 .2765 3032 .3125 
3 | Sonor 0012 10085 shee") 0880 O82 “9951 ,1382 .1801 .2344 
s | .0000 .0001 .0004 .0015 -0045 10102 .0205 .0369 .0609 -0938 
e | 0000 .0000 .0000 .0001 0002 “9007 .0013 .0041 0083. .0150 
q o |. вәзз .4783 .3206 :2097 1389 .O824 .0490 .0280 .0152 0078 
1 2573 3720 .3960 .3070 .3118 .2471 “1848 .1800 .0872 .0547 
2 | 20406. 140 -2097 2758 -8115 "3177 .2985 .2613 .2140 .1041 
в | 10036 .0230 .0017 .1147 1730 "2269 .2679 .2903 -2918 .2734 
4 | 10002 .0026 .0109 .0287 .0577 10572. 1449 .1905. 12388 2734 
5 | .оооо .0002 .0012 -0043 "EIS. 10250: 14H98 1 20174: ANAS] OD 
6 ооо .0001 .0004 .0013 -0036 ‘oosa .0172 .0320 .0547 
7 "ооо .0001 .0002 .0006 .0016 .0037 -0078 
Len 1678. -1001 .0576 .0319 -0103 .0084 -0039 
1 "2555 2670 .1977 1373 0890 10548 .0312 
2 "ose 9115 .2005 -2587 2000 11569 .1094 
3 2468 2076 .254 2780 .2787 12568 .2188 
4 paso 0865 .1361 .1875 -2322 12627 .2734 

* Based on Tables of the Binomial Probability Distributio: 


of Standards Applied Mathematics Series No. 
ment Printing Office, 1950. 


6. Washington, 


D.C.: U.S. Govern- 
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n т .05 .10 -15 -20 .25 .30 .35 .40 -45 .50 
8 5 .0000 .0004 .0026 .0092 .0231 .0467 .0808 .1239 .1719 .2188 
6 .0000 .0000 .0002 .0011 .0038 .0100 .0217 .0413 .0703 1094 
7 .0000 .0000 .0000 .0001 .0004 .0012 .0033 .0079 .0164 .0312 
8 0000 .0000 .0000 .0000 .0000 .0001 .0002 .0007 .0017 .0039 
9 0 6302 .3874 .2316 .1342 .0751 .0404 .0207 .0101 .0046 .0020 
1 2985 .3874 .3679 .3020 .2253 .1556 .1004 0605 .0339 .0176 
2 0629 .1722 .2597 .3020 .3003 .2668 .2162 1612 .1110 .0703 
3 .0077 ..0446 .1069 .1762 .2336 .2668 .2716 2508 .2119 1641 
4 .0006 .0074 .0283 .0661 1168 .1715 .2194 .2508 .2600 2461 
e 5 .0000 .0008 .0050 .0165 .0389 .0735 .1181 .1672 .2128 .2461 
6 .0000 .0001 .0006 .0028 .0087 .0210 .0424 .0743 .1160 .1641 
7 .0000 .0000 .0000 .0003 .0012 .0039 .0098 .0212 .0407 .0703 
8 0000 .0000 .0000 .0000 .0001 .0004 .0013 .0035 . 0083 0176 
9 0000 .0000 .0000 .0000 .0000 .0000 .0001 0003 .0008 .0020 
10 0 5987 .3487 .1969 .1074 0563 .0282 .0135 .0060 .0025 .0010 
1 .3151 .3874 .3474 .2684 1877 .1211 0725 .0403 .0207 .0098 
2 .0746 .1937 .2759 3020 2816 .2335 .1757 1209 .0763 0439 
3 0105 .0574 .1298 ..2013 .2503 .2668 .2522 2150 .1665 1172 
4 .0010 .0112 .0401 .0881 1460 .2001 .2377 2508 .2384 .2051 
5 .0001 .0015 .0085 .0264 0584 .1029 .1536 .2007 .2340 2461 
6 .0000 „0001 .0012 :0055 .0162 .0368 .0689 .1115 .1596 .2051 
7 .0000 .0000 .0001 .0008 .0031 .0090 0212 .0425 .0746 1172 
8 .0000 .0000 .0000 .0001 .0004 .0014 .0043 .0106 .0229 .0439 
9 .0000 .0000 .0000 .0000 .0000 .0001 .0005 .0016 .0042 .0098 
10 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0001 .0003 .001% 
11 0 5688 .3138 .1673 .0859 .0422 .0198 .0088 .0036 .0014 0005 
1 .3293 .3835 .3248 2362 .1549 .0932 .0518 .0266 .0125 .0054 
2 .0867 .2131 .2866 .2953 .2581 .1998 .1395 .0887 .0513 0269 
3 0137 .0710 .1517 .2215 .2581 .2568 .2254 1774 .1259 .0806 
4 .0014 .0158 .0536 1107 1721 .2201 .2428 .2365 .2060 1611 
5 .0001 .0025 .0132 .0388 .0803 .1321 .1830 .2207 .2360 2256 
6 0000 .0003 .0023 .0097 .0268 .0566 .0985 1471 .1931 .2256 
7 0000 .0000 .0003 .0017 .0064 .0173 .0379 .0701 .1128 .1611 
8 0000 .0000 .0000 .0002 .0011 .0037 .0102 .0234 .0462 .0806 
9 0000 .0000 .0000 .0000 .0001 .0005 .0018 .0052 .0126 .0269 
10 0000 .0000 .0000 .0000 .0000 .0000 - .0002 0007 .0021 .0054 
11 0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0002 0005 
12 0 5404 .2824 .1422 .0687 .0317 .0138 .0057 .0022 .0008 .0002 
1 .3413 .3766 .3012 .2062 1267 .0712 .0368 .0174 .0075 0029 
2 0988 .2301 .2924 2835 .2323 .1678 .1088 .0639 .0339 .0161 
3 0173 .0852 .1720 2362 .2581 .2397 .1954 1419 .0923 .0537 
4 0021 .0213 .0683 1329 .1936 .2311 .2367 .2128 .1700 .1208 
5 .0002 .0038 .0193 .0532 .1032 .1585 .2039 .2270 .2225 1934 
6 0000 .0005 .0040 0155 .0401 .0792 .1281 1766 .2124 2256 
4i .0000 .0000 .0006 .0033 .0115 .0291 .0591 1009 .1489 1934 
8 0000 .0000 .0001 .0005 .0024 .0078 .0199 .0420 .0762 .1208 
9 .0000 4.0000 ‚0000 .0001 .0004 .0015 .0048 0125 .0277 .0537 


TABLE | (continued) 


"on E .05 10  .15  .20  .25  .80  .35  .40  .45 
12 10 .0000 .0000 .0000 .0000 .0000 .0002 .0008 .0025 .0068 
1 “0000 .0000 .0000 .0000 .0000 .0000 .0001 .0003 .0010 
12 “0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0001 
з 0 .5133 .2542 .1209 .0550 .0238 .0097 .0037 .0013 .0004 
1 "3512 .3072 .2774 .1787 .1029 .0540 .0259 .0113 .0045 
2 71109 .2448 .2937 .2680 .2059 .1388 .0836 .0453 .0220 
з 10214 .0997 .1900 .2457 .2517 .2181 .1651 .1107 .0660 
4 “0028 0277 .0838 .1535 .2097 .2337 .2222 .1845 .1850 
5 0003 .0055 .0266 .0691 .1258 .1803 .2154 .2214 .1989 
6 :0000 .0008 .0063 .0230 .0559 .1030 .1546 .1968 .2109 
7 10000 .0001 .0011 .0058 .0186 .0442 .0833 .1312 .1775 
8 “9000 10000 .0001 .0011 .0047 .0142 .0336 .0656 .1089 
9 “9000 10000 1.0000 .0001 .0009 .0034 .0101 .0243 .0495 
10 .0000 0000 .0000 .0000 .0001 .0006 .0022 .0085 .0102 
11 70000 0000 .0000 .0000 .0000 .0001 .0003 .0012 .0036 
12 “9000 .0000 .0000 .0000 .0000 .0000 .0000 .0001 .0005 
13 "0000 0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 
14 0 .4877 .2288 .1028 .0440 .0178 .0068 .0024 -0008 .0002 
1 78503 .3559 .2539 .1539 .0832 .0407 0181 .0073 .0027 
2 "1229 .2570 .2912 .2501 .1802 .1134 .0634 :0317 .0141 
3 “0259 .1142 .2056 .2501 .2402 .1943 .1366 .0845  .0402 
4 70037 .0349 .0998 .1720 .2202 .2290 .2022 . 1549 .1040 
5 .0004 .0078 .0352 .0860 .1408 .1903 .2178 :2066 .1701 
6 70000 0013 .0093 .0322 .0734 .1262 .1759 .2066  .2088 
7 70000 10002 .0019 .0092 .0280 .0618 .1082 .1574 .1952 
8 70000 .0000 .0003 .0020 .0082 .0232 -0510 .0918 .1398 
9 70000 .0000 .0000 .0003 .0018 .0066 0188 .0408 .0762 
10 .0000 .0000 .0000 .0000 .0003 .0014 .0049 .0136 .0312 
11 `оо0оо .0000 .0000 .0000 .0000 .0002 .0010 .0033 .0093 
12 “9000 .0000 .0000 .0000 .0000 .0000 .0001 .0005 .0019 
13 70000 .0000 .0000 .0000 .0000 .0000 .0000 .0001 0002 
14 70000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 
15 0 .4633 .2059 .0874 .0352 .0134 .0047 .0018 .0005 .0001 
1 "3658 .3432 .2312 .1319 0668 .0305 .0126 .0047 .0016 
2 "1348 2669 .2856 .2309 .1559 .0916 .0476 .0219 .0090 
3 10307 .1285 .2184 .2501 .2252 .1700 1110 .0634 .0318 
4 70049 .0428 .1156 .1876 .2252 .2186 1792 .1208 .0780 
5 .0006 .0105 .0449 .1032 .1651 .2001 .2123 .1850 .1404 
6 10000 .0019 .0132 .0430 .0917 .1472 .1906 .2066 .1914 
7 10000 .0003 .0030 .0138 .0393 .0811 .1319 .1771 .2013 
8 70000 .0000 .0005 .0035 .0131 10348 .0710 ,1181 .1647 
9 “0000 .0000 .0001 .0007 .0034 “0116 .0298 .0612 .1048 
10 .0000 .0000 .0000 .0001 .0007 .0930 .0096 .0245 .0515 
11 “9000 {0000 .0000 .0000 .0001 .0009 .0024 .0074 .0191 
12 “9000 .0000 .0000 .0000 .0000 .0001 .0004 .0016 .0052 
13 70000 .0000 .0000 .0000 0000 “0000 .0001 .0003 .0010 
14 10000 .0000 0000 .0000 .0000 .0000 .0000 .0000 .0001 
15 .0000 .0000 .0000 .0000 .0000 .0000 .0000 0000 .0000 
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nm E un dO 15 7:20, 
16 0 .4401 .1853 .0743 .0281 
1 .$706 .3204 .2097 .1126 
2 .1463 .2745 .2775 .2111 
3 0359 .1428 .2285 .2463 
4 .0061 .0514 .1311 .2001 
5 .0008 .0137 .0555 .1201 
6 .0001 .0028 .0180 .0550 
7 -0000 .0004 .0045 .0197 
8 ‚0000 .0001 .0009 .0055 
9 .0000 .0000 .0001 .0012 
10 .0000 0000 .0000 .0002 
11 0000 .0000 .0000 .0000 
12 .0000 .0000 .0000 .0000 
13 .0000 .0000 .0000 .0000 
14 .0000 .0000 .0000 .0000 
15 .0000 .0000 .0000 .0000 
16 .0000 .0000 ‚0000 .0000 
MO 4181 .1608 .0031 
1 B741 .3150  .1893 
2 1575 2800 .2073 
3 .0415 .1556 ‚2359 .2303 
4 .0076 .0605 .1457 .2003 
5 .0010 .0175 .0668 .1361 
6 0001 .0039 .0236 .0680 
7 .0000 „0007 .0065 .0267 
8 .0000 .0001 .0014 .0084 
9 .0000 .0000 .0003 .0021 
10 0000 .0000 .0000 .0004 
11 ‚0000 .0000 .0000 .ооо1 
12 .0000 .0000 .0000 .оооо 
18 .0000 .0000 .0000 .0000 
14 .0000 .0000 .0000 .0000 
15 .0000 .0000 .0000 .0000 
16 .0000 .0000 .0000 .0000 
17 .0000 0000 .0000 .0000 
18 0 .3972 1501 .0536 .0180 
1 8763 .3002 .1704 .0811 
2 1083 .2835/ .2556 .1723 
3 .0473 .1680 .2400 .2297 
4 .0093  .&7 .1692. .2153 
5. ‚0014 0218 .0787 .1507 
6 .0002 .0052 .0301 .0816 
7 .0000 .0010 .0091 .0350 
8 .0000 .0002 .0022 .0120 
9 .0000 .0000 .0004 .0033 


.30 .35 .40 .45 .50 
.0033 .0010 .0003 .0001 .0000 
.0228 .0087 .0030 .0009 .0002 

0732 .0353 0150 .0056 .0018 
.1465 0888 .0468 .0215 .0085 
.2040 .1553 .1014 .0572 .0278 
.2099 .2008 1623 1123 .0667 
.1649 1982 .1983 1684 .1222 
.1010 .1524 1889 1969 .1746 
.0487 .0923 .1417 .1812 .1964 

0185 .0442 .0840 .1318 .1746 
.0056 .0167 .0392 .0755 .1222 
.0013 .0049 .0142 .0337 .0667 
.0002 .0011 .0040 .0115 .0278 
.0000 .0002 .0008 .0029 .0085 
.0000 .0000 .0001 .0005 .0018 
.0000 .0000 .0000 .0001 .0002 
.0000 .0000 .0000 0000 .0000 
.0023 .0007 0002 .0000 .0000 

0169 .0060 .0019 .0005 .0001 
-0581 0260 .0102 .0035 .0010 

1245 0701 0341 0144 ‚0052 

1868 1320 .0796 .0411 .0182 
.2081 .1849 .1379 .0875 .0472 

1784 1991 1839 1432 .0944 

1201 1685 1927 1841  .1484 

0644 1134 .1606 1883 .1855 
-0276 .0611 .1070 1540 .1855 
.0095  .0263 0571  .1008 .1484 
.0026 .0090 .0242 .0525 .0944 
.0006  .002 .0081 .0215 .0472 
.0001 .0005 .0021 .0068 .0182 
.0000 .0001 .0004 .0016 .0052 

0000 .0000 .0001 .0003 .0010 
.0000 .0000 .0000 .0000 .0001 
.0000 .0000 .0000 .0000 .0000 
.0016 .0004 .0001 .0000 .0000 
-0126 .0042 .0012 .0003 .0001 

0458 .0190 .0069 .0022 ,0006 

1046 .0547 .0246 .0095 .0031 
-1681 1104 .0614 .0291 .0117 
.2017 .1664 .1146 .0666 .0327 
.1873 .1941 .1655 1181 .0708 

1376 1792 .1892 .1657 .1214 
.0811 .1327 .1734 .1864 .1669 
.0386 .0794 .1284 .1694 1855 


TABLE | (continued) 


6 

n z .05 .10 ‚15 ‚20 .25 .80 -35 .40 45 .50 
18 10 0000 .0000 .0001 .0008 .0042 .0149 0385 .0771 .1248 1669 
11 0000 .0000 .0000 0001 .0010 .0046 0151 .0374 0742 1214 
12 0000 .0000 .0000 .0000 0002 .0012 .0047 .0145 .0354 0708 
13 0000 .0000 .0000 .0000 .0000 0002 .0012 .0045 .0134 0327 
14 0000 .0000 .0000 .0000 “0000 .0000 .0002 .0011 .0039 0117 
15 0000 .0000 .0000 .0000 .0000 .0000 .0000 .0002 0009 .0031 
16 0000 .0000 .0000 .0000 “0000 .0000 .0000 .0000 .0001 0006 
17 0000 .0000 .0000 .0000 “0000 .0000 .0000 .0000 0000 .0001 
18 0000 .0000 .0000 .0000 “0000 .0000 .0000 .0000 0000 .0000 
19 0 3774 .1351 .0456 .0144 .0042 .0011 .0003 .0001 0000 .0000 
H 3774 .2852 .1529 .0085 “0268 .0093 .0029 .0008 0002 .0000 
2 1787 .2852 .2428 .1540 [0803 .0358 .0138 .0046 0013 .0003 
3 0533 .1796 .2428 .2182 . 1517 .0869 .0422 .0175 0062 .0018 
4 0112 .0798 .1714 .2182 /2093 .1491 .0909 .0467 0203 .0074 
5 0018 .0266 .0907 -1636 .2023 .1916 .1468 .09з33 .0497 .0222 
6 0002 .0069 :0374 .0955 - 1574 .1918 .1844 .1451 .0949 .0518 
7 0000 .0014 .0122 .0443 10974 .1525 .1844 .1797 . 1443 .0961 
8 0000 .0002 .0032 .0166 10487: .0981 .1489 .1797 . 1771 .1442 
9 0000 .0000 .0007 -0051 .0198 .0514 .0980 .1464 .1771 .1702 
19 0000 .0000 .0001 .0013 .0066 .0220 .0528 .0976 .1449 ‚1762 
11 0000 .0000 .0000 .0003 0018 .0077 .0233 10532 .0970 .1442 
12 0000 .0000 0000 .0000 .0004 .0022 0083 .0237 0529 -0981 
13 0000 .0000 .0000 0000 .0001 .0005 .0024 0085 ` -0233 .0518 
14 0000 .0000 .0000 0000 .0000 .0001 .0006 .0024 .0082 0222 
15 .0000 .0000 .0000 .0000 .0000 .0000 0001. .0005 .0022 .0074 
16 .0000 .0000 .0000 .0000 .0000 .0000 0000 .0001 .0005 0018 
17 .0000 .0000 .0000 0000 .0000 .0000 0000 .0000 .0001 0003 
18 .0000 .0000 .0000 0000 .0000 .0000 0000 .0000 .0000 -0000 
19 .0000 0000 .0000 “0000 .0000  .0000 0000 .0000 0000 0000 
20 0 (3585 .1216 .0388 0115 .0032 .0008 .0002 .0000 .0000 0000 
1 ‘3774 .2702, .1368 0576 .0211 .0068 .0020 .0005 .0001 0000 
2 1887 .2852 .2293 1369 .0669 .0278 .0100 .0031 .0008 10002 
з 0596 .1901 .2428 2054 1339 .0716 0323 .0123 .0040 0011 
4 0133 .0898 .1821 2182 .1897 .1304 .0738 .0350 .0139 0046 
5 ‚0022 .0319 .1028 1746 .2023 .1789 - 1272 .0746 .0365 0148 
6 .0003 .0089 .0454 1091 .1086 .1916 (1712 .1244 .0746 :0370 
7 .0000 .0020  .0160 0545 .1124 .1643 ‘1844 .1659 .1221 ‚0739 
8 .0000 .0004  .0046 0222 .0009 .1144 · 1614 .1797 .1023 . 1201 
9 .0000 .0001 .0011 0074 .0271 .0654 - 1158 .1597 .1771 1662 
10 .0000 .0000  .0002 0020 .0099 0308 .0686 .1171 ‚1593 .1762 
11 .0000 .0000 .0000 0005 .0030 .0120 10336 .0710 .1185 ‚1602 
12 .0000 .0000 .0000 0001 .0008 -0039 .0136 .0355 ,0727 .1201 
13 .0000 .0000 ‚0000 .0000 .0002 (0010 .0045 -0140 .0366 .0739 
14 .0000 .0000 .0000 .0000 .0000 0002 .0012 0049 10150 .0370 
15 .0000 .0000 .0000 .0000 .0000 .0000 .0003 10013 .0049 0148 
16 .0000 .0000 .0000 .0000 .0000 “9000 .0000 .0003 ,0013  .0046 
17 .0000 .0000 .0000 10000 .0000  .0000 “0000 .0000 ‚0002 .0011 
18 .0000 .0000 -0000 “9000 .0000 .0000 0000 .0000 .0000 ‚0002 
19 .0000 .0000 .0000 .0000 .0000 .0000 +0000 10000 .0000  .0000 
20 .0000 .0000 .0000 .0000  .0000 >.0000 .0000 .0000 .0000 0000 
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TABLE 1 


Poisson Probabilities’ 


т 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 
0 ‚9048 8187 7408 .6703 .6065 5488 4966 4493 .4066 .3679 
$ -0905 1637 2222 .2681 .3033 .3293 3476 3595 .3659 .3679 
2 .0045 .0164 .0333 .0536 .0758 .0988 .1217 1438 .1647 .1839 
3 .0002 .0011 .0033 .0072 .0126 .0198 .0284 .0383 .0494 .0613 
4 .0000 .0001 .0002 .0007 .0016 .0030 .0050 .0077 .0111 .0153 
5 .0000 .0000 .0000 .0001 .0002 .0004 .0007 .0012 .0020 .0031 
6 .0000 .0000 .0000 .0000 .0000 .0000 .0001 .0002 .0003 .0005 
7 +0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0001 
g 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 
0 .3329 .3012 2725 .2406 .2231 .2019 .1827 .1653 1496 .1353 
1 .3662 .3614 .3543 .3452 .3347 .3230 .3106 .2975 .2842 .2707 
2 -2014 2169 .2303 .2417 .2510 .2584 .2640 .2678 .2700 .2707 
3 .0738 .0867 .0998 1128 1255 1378 .1496 .1607 1710 .1804 
4 .0203 .0260 .0324 .0395 .0471 .0551 .0636 .0723 .0812 .0902 
5 .0045 .0062 .0084 „0111 .0141 .0176 .0216 .0260 .0309 .0361 
6 .0008 .0012 .0018 .0026 .0035 .0047 .0061 .0078 .0098 .0120 
7 :0001 .0002 .0003 .0005 .0008 .0011 .0015 .0020 .0027 .0034 
8 .0000 .0000 .0001 .0001 .0001 .0002 .0003 .0005 .0006 .0009 
e .0000 .0000 .0000 .0000 .0000 .0000 .0001 .0001 .0001 .0002 
т 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 
0 .1225 .1108 .1003 .0907 .0821 .0743 .0672 .0608 .0550 .0498 
1 .2572 .2438 .2306 .2177 .2052 .1931 .1815 .1703 .1596 .1494 
2 .2700 .2681 .2652 .2613 .2565 .2510 2450 2384 .2314 .2240 
3 -1890 .1966 .2033 .2090 .2138 .2176 .2205 .2225 .2237 .2240 
4 .0992 1082 1169 1254 1336 1414 1488 1557 1622 .1680 
5 0417 .0476 .0538 .0602 .0668 .0735 .0804 .0872 .0940 .1008 
6 .0146 .0174 .0206 .0241 .0278 .0319 .0362 0407 .0455 .0504 
7 .0044 .0055 .0068 .0083 .0099 .0118 .0139 .0163 .0188 .0216 
8 .0011 .0015 .0019 .0025 .0031 .0038 .0047 .0057 .0068 .0081 
9 0003 .0004 .0005 .0007 .0009 .0011 .0014 .0018 .0022 .0027 
10 +0001 .0001 .0001 .0002 .0002 .0003 .0004 .0005 .0006 .0008 
11 .0000 .0000 .0000 .0000 .0000 .0001 .0001 .0001 .0002 .0002 
12 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0001 


t Reproduced by permission from Handbook of Probability and Statistics 
with Tables, by R. S. Burington and D. C. May, Jr. New York: McGraw-Hill 


Book Company, 1953. 
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TABLE Il (continued) 


IGNORES АА а a E ест. Mai 


т 3.1 8.2 3.3 3.4 8.5 3.6 3.7 3.8 3.9 4.0 


0 .0450 .0408 .0369 .0334 .0302 .0273 .0247 .0224 .0202 .0183 
1 .1397 .1304 .1217 .1135 .1057 .0984 .0915 .0850 .0789 .0733 
2 .2165 .2087 .2008 .1929 .1850 .1771 .1692 .1615 .1539 . 1465 
3 .2237 .2226 .2209 .2186 .2158 .2125 .2087 .2046 .2001 .1954 
4 .1734 .1781 .1823 .1858 .1888 .1912 .1931 .1944 .1951 .1954 


5 .1075 .1140 .1203 .1264 .1322 .1377 .1429 .1477 .1522 .1563 
6 А А .0662 .0716 .0771 .0826 .0881 .0936 .0989 .1042 
7 .0246 .0278 .0312 .0348 .0385 .0425 .0466 .0508 .0551 .0595 
8 .0095 .0111 .0129 .0148 .0169 .0191 .0215 .0241 .0269 .0298 
9 .0033 .0040 .0047 .0056 .0066 .0076 .0089 .0102 .0116 ,0132 


10 .0010 .0013 .0016 .0019 .0023 .0028 .0033 .0039 .0045 .0053 

11 .0003 .0004 .0005 .0006 .0007 .0009 .0011 .0013 .0016 .0019 

12 .0001 .0001 .0001 .0002 .0002 .0003 .0003 .0004 .0005 .0006 

13 .0000 .0000 .0000 .0000 .0001 .0001 .0001 .0001 .0002 .0002 

14 “0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0001 
` 

т 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 


.0166 .0150 .0136 .0123 .0111 .0101 .0091 .0082 .0074 ‚0067 
“0679 .0630 .0583 .0540 .0500 .0462 .0427 .0395 .0365 .0337 
. .1125 .1063 .1005 .0948 .0894 .0842 
.1904 .1852 .1798 .1743 .1687 .1631 .1574 .1517 .1400 .1404 
.1951 .1944 .1933 .1917 .1898 .1875 .1849 .1820 .1789 .1755 


љоно 
ПЯ 
5 
E] 
© 
Li 
© 
ч 
© 
- 
К 
© 
= 
= 
© 
® 


.1600 .1633 .1662 .1687 .1708 .1725 .1738 .1747 .1753 .1755 


5 
| в 11093 .1143 .1191 .1237 .1281 „1323 .1362 .1398 .1432 .1462 
7 10640 .0686 .0732 .0778 .0824 10869 .0914 .0959 .1002 .1044 
8 "0328 .0360 .0393 .0428 .0463 ‘0500 .0537 .0575 .0614 .0653 
9 “0150 .0168 .0188 .0209 .0232 .0255 .0280 .0307 .0334 .0363 
10 .0061 .0071 .0081 .0092 .0104 .0118 .0132 .0147 .0164 .0181 
11 .0023 .0027 .0032 .0037 .0043 “0049 .0056 .0064 .0073 .0082 
12 “0008 .0009 0011 .0014 .0016 0019 .0022 .0026 .0030 .0034 
13 “0002 .0003 0004 .0005 .0006 “0007 .0008 .0009 .0011 .0013 
14 “0001 .0001 .0001 .0001 .0002 “0002 .0003 .0003 .0004 .0005 
15 0000 .0000 .0000 .0000 .0001 .0001 .0001 .0001 .0001 .0002 

Ly 


2 5.8 5.4 5.5 5.6 5.7 5.8 5.9 6.0 


.0061 .0055 .0050 .0045 .0041 .0037 0033 .0030 .0027 .0025 


.0311 .0287 .0265 10244 10225 .0207 .0191 .0176 .0162 .0149 
0793 .0746 .0701 .0659 “0618 .0580 .0544 .0509 10477 .0446 
11185 .1133 .1082 .1033 .0985 .0938 .0892 


4 1719 .1681 .1641 .1600 11558 .1515 .1472 .1428 .1383 .1339 


TABLE II (continued) 


Sexuac v 


570 


À 
5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6.0 
.1753 .1748 .1740 .1728 .1714 1697 1678 .1656 .1632 ‚1606 
.1490 .1515 .1537 .1555 .1571 1584 1594 .1601 .1605 ‚1606 
1086 .1125 .1163 .1200 .1234 .1267 .1298 .1326 .1353 .1377 
.0692 .0731 .0771 .0810 ‚0849 .0887 .0925 .0962 .0998 .1033 
.0392 0423 .0454 .0486 .0519 .0552 .0586 .0620 .0654 .0688 
.0200 .0220 .0241 .0262 .0285 .0309 .0334 .0359 .0386 .0413 
.0093 .0104 .,0116 .0129 .0143 .0157 .0173 .0190 .0207 .0225 
0039 .0045 .0051 ,0058 .0065 .0073 .0082 .0092 „0102 .0113 
0015 .0018 .0021 .0024 .0028 .0032 .0036 .0041 .0046 .0052 
"0006 .0007 .0008 .0009 .0011 .0013 .0015 .0017 .0019 .0022 
0002 .0002 .0003 .0003 .0004 .0005 .0006 .0007 .0008 .0009 
0001 .0001 .0001 .0001 .0001 .0002 .0002 .0002 .0003 .0003 
0000 .0000 .0000 .0000 .0000 .0001 .0001 .0001 .0001 ооо! 

^ 
6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 7.0 
0022 .0020 .0018 .0017 .0015 .0014 .0012 .0011 .0010 .0009 
‚0137 .0126 ‚0116 .0106 .0098 .0090 .0082 .0076 .0070 .0064 
+0417 .0300 .0364 .0340 .0318 .0296 .0276 .0258 .0240 10223 
‘0848 .0806 .0765 .0726 .0088 .0652 .0617 :0584 .0552 .0521 
1294 .1249 .1205 .1162 .1118 .1076 .1034 .0992 .0952 .0912 
1579 .1549 .1519 .1487 .1454 .1420 .1385 .1349 103814 .1277 
1605 .1601 .1595 .1586 .1575 .1562 .1540 .1529 ‚1511 .1490 
-1399 .1418 .1435 .1450 .1402 1472 1480 .1486 ‚1489 .1490 
+1066 .1099 .1130 .1160 .1188 .1215 .1240 .1263 .1284 .1304 
+0723 .0757 .0791 .0825 .0858 .0891 .0923 .0954 .0985 .1014 
0441 .0469 .0498 .0528 .0558 .0588 .0618 -0649 .0679 .0710 
0245 .0265 .0285 .0307 `.0330 .0353 .0377 «0401 ‚0426 .0452 
+0124 ‚0137 .0150 .0164 .0179 .0194 .0210 .0227 .0245 .0264 
0058 .0005 .0073 .0081 .0089 .0098 .0108 .0119 .0130 ‚0142 
+0025 .0029 .0033 .0037 .0041 .0046 .0052 .0058 .0064 .0071 
0010 .0012 .0014 .0016 .0018 +0020 .0023 .0026 .0029 .0033 
0004 .0005 .0005 ,0006 .0007 .0008 :0010 .0011 ,0013 .0014 
0001 .0002 ‚0002 .0002 .0003 .0003 -0004 .0004 .0005 .0006 
:0000 .0001 .0001 .0001 .0001 ооо! .0001 .0002 .0002 .0002 
.0000 .0000 .0000 .0000 .0000 .0000 .0000 .0001 .0001 .0001 

À 
7.1 7.2 7.8 7.4 7.5 7.6 2.7* 7.8 7.9 8.0 
0008 .0007 .0007 0006 .0000 .0005 .0005 .0004 .0004 .0003 
-0059 .0054 .0049 .0045 +0041 .0038 „0035 .0032 .0029 .0027 
:0208 .0194 .0180 .0167 +0156 .0145 ‚0134 ‚0125 0116 .0107 
-0492 .0464 .0438 .0413 .0389 .0366 .0345 .0324 .0305 .0286 
+0874 .0836 .0799 .0764 .0729 .0696 .0663 .0632 .0602 .0573 
1241 .1204 .1167 1130 .1094 .1057 .1021 .0986 .0951 .0916 
1468 .1445 .1420 .1394 .1367 .1339 -1311 .1282 ,1252 .1221 
1489 .1486 .1481 1474 1465 1454 .1442 .1428 1413 1396 
1321 .1337 .1351 .1363 .1373 1382 1388 .1392 .1395 .1396 
1042 .1070 .1096 .1121 .1144 .1167 +1187 ‚1207 .1224 .1241 


TABLE 1! (continued) 


0086 

0019 0021 .0024 0026 0020 

E .0010 „о0о! .0012 .0014 

.0004 .0004 ‚0005 .0005 .0000 

.0002 .0002 .0002 .0002 .0002 
22 ‘oooi .0001 .0001 .0001 .0001 

A 
z 9.1 92 93 0.4 9.5 96 97 ов 99 10 
о 10001 .0001 .0001 .0001 .0001 (0001 .0001 .0001 0001 .0000 
1 .0010 .0009 .0009 .0008 .0007 10007 „0008 .0005 ‚0005 .0005 
2 “0046 .0043 .0040 .0037 .0034 “0031 .0029 .0027 .0025 .0023 
3 10140 .0131 .0123 .0115 “0107 .0100 .0003 .0087 0081 .0076 
4 .0319 .0302 .0285 .0209 10254 .0240 .0220 .0213 .0201 0180 


7.2 


:0770 
.0504 
.0303 
.0168 
.0086 


.0041 


7.3 


7.4 


.0829 
.0558 
.0344 
‚0196 
.0104 


7.5 


LI 


7.6 


7л 


.0914 
.0040 
ou 
+0243 
0134 


6 8.7 


0002 .0002 
.0016 .0014 
0068 .0063 
20105  .OIN3 
0420 


0722 
1034 
1271 
‚1366 
,1306 


тв 7.9 50 
.0041 .0007 .0003 
10007 .0095 .0722 
10434 „0457 0481 
.0200 .0278 .0296 
‚0145 .0157 .0109 
.0075 .0083 0090 
‚0037 0041 .0045 
‚0017 .0019 .0021 
‚0007 0008  .0009 
.0001 .0001 .0002 
.0000 ‚0001 .0001 

9.0 
.0002 .0001 .0001 
20013 .0012 .0011 
20171 20160 0150 


aum 


TABLE 1 (continued) 


.0581 .0555 .0530 .0506 .0483 .0460 .0439 .0418 .0398 .0378 
.0881 .0851 .0822 .0793 .0764 .0736 .0709 .0682 .0656 .0631 


-1302 .1286 .1269 .1251 .1232 .1212 .1191 .1170 .1148 .1126 
1317 .1315 .1311 .1306 .1300 .1293 .1284 .1274 .1203 .1251 


10 11198 .1210 .1219 .1228 .1235 .1241 .1245 .1249 .1250 .1251 
11 .0991 .1012 .1031 .1049 .1067 .1083 .1098 .1112 .1125 .1137 

2 .0752  .0776 .0799 .0822 .0844 .0866 .0888 .0908 .0928 .0948 
13 .0526 .0549 .0572 .0594 .0617 .0640 .0662 .0685 .0707 .0729 
14 .0342 .0361 .0380 .0399 .0419 .0439 .0459 .0479 .0500 .0521 
15 .0208 .0221 .0235 .0250 .0265 .0281 .0297 .0313 .0330 .0347 
16 -0118  .0127 .0137 .0147 .0157 .0168 .0180 .0192 .0204 .0217 
17 .0063 .0069 .0075 .0081 .0088 .0095 .0103 .0111 .0119 .0128 
18 .0032 .0035 .0039 .0042 .0046 .0051 .0055 .0060 .0065 .0071 
19 -0015 .0017 .0019 .0021 .0023 .0026 .0028 .0031 .0034 .0037 
20 .0007 .0008 .0009 .0010 .0011 .0012 .0014 .0015 .0017 .0019 
21 .0003 .0003 .0004 .0004 .0005 .0006 .0006 .0007 .0008 .0009 
22 .0001 .0001 .0002 .0002 .0002 .0002 .0003 .0003 .0004 .0004 
23 -0000 .0001 .0001 .0001 .0001 .0001 .0001 .0001 .0002 .0002 
24 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0001 .0001 .0001 

л 
= 11 12 13 14 15 16 17 18 19 20 


0 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 
1 .0002 .0001 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 
2 :0010 .0004 .0002 .0001 .0000 .0000 .0000 .0000 .0000 .0000 
3 .0037 .0018 .0008 .0004 .0002 .0001 .0000 .0000 .0000 .0000 
4 -0102 .0053 .0027 .0013 .0006 .0003 .0001 .0001 .0000 .0000 


5 .0224 .0127 .0070 .0037 .0019 .0010 .0005 .0002 .0001 .0001 
6 -0411 .0255 .0152 .0087 .0048 .0026 .0014 .0007 .0004 .0002 
7 -0646 .0437 ‚0281 .0174 .0104 .0060 .0034 .0018 .0010 .0005 
8 .0888 „0655 .0457 .0304 .0194 .0120 .0072 .0042 .0024 .0013 
9 .1085 .0874 .0061 .0473 .0324 .0213 .0135 .0083 .0050 .0029 


10 .1194 .1048 .0859 .0663 .0486 .0341 .0230 .0150 .0095 .0058 
11 -1194 .1144 .1015 .0844 .0663 .0496 .0355 .0245 .0164 .0106 
12 :1094 .1144 .1099 .0984 .0829 .0661 .0504 .0368 .0259 .0176 
13 .0926 .1056 .1099 .1060 .0956 .0814 .0658 .0509 .0378 .0271 
14 .0728 .0905 .1021 .1060 .1024 .0930 .0800 .0655 .0514 .0387 
15 .0534 .0724 .0885 .0989 .1024 .0992 .0906 .0786 .0650 .0516 
16 .0367 .0543 .0719 .0866 .0960 .0992 .0963 .0884 .0772 .0646 
17 -0237 .0383 .0550 .0713 .0847 .0934 .0963 .0936 .0863  .0760 
18 -0145 .0256 .0397 .0554 .0706 .0830 .0909 .0936 .0911 .0844 
19 0084 .0161 .0272 .0409 .0557 .0699 .0814 .0887 .0911 .0888 
20 -0046 .0097 .0177 .0286 .0418 .0559 .0692 .0798 .0866 .0888 
21 | -0024 ..0055 .0109 .0191 .0299 .0426 .0560 .0684 .0783 .0846 
22 -0012 .0030 .0065 .0121 .0204 .0310 .0433 .0560 .0676 .0769 
23 .0006 .0016 .0037 .0074 .0133 .0216 .0320 .0438 .0559 .0669 


24 .0003 .0008 .0020 .0043 .0083 .0144 .0226 .0328 .0442 .0557 
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TABLE Il (continued) 


т 11 12 13 14 15 16 17 18 19 20 
25 .0001 .0004 .0010 .0024 .0050 .0092 .0154 .0237 .0336 .0446 
26 .0000 .0002 .0005 .0013 .0029 .0057 .0101 0164 .0246 .0343 
27 .0000 .0001 .0002 .0007 .0016 .0034 .0063 .0109 .0173 .0254 
28 “0000 .0000 .0001 .0003 .0009 .0019 .0038 .0070 .0117 .0181 
29 “0000 .0000 .0001 .0002 .0004 .0011 .0023 .0044 .0077 .0125 
30 .0000 .0000 .0000 .0001 .0002 .0006 .0013 .0026 .0049 .0083 
31 “0000 .0000 .0000 .0000 .0001 .0003 .0007 .0015 .0030 ‚0054 
32 “0000 .0000 .0000 .0000 .0001 .0001 .0004 .0009 .0018 .0034 
33 “0000 .0000 .0000 .0000 .0000 .0001 .0002 .0005 .0010 .0020 
34 .0000 .0000 .0000 .0000 .0000 .0000 .0001 .0002 .0006 .0012 
35 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0001 .0003 .0007 
36 “0000 .0000 .0000 .0000 .0000 .0000 .0000 .0001 .0002 .0004 
37 “0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0001 .0002 
38 “0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0001 
39 “0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0001 

Н М ЧАШ a Seer eee a a DT a 
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TABLE 11 


Standard Normal Distribution 


z .00 .01 .02 .03 .04 .05 -06 .07 .08 .09 
0.0 | .0000 .0040 .0080 .0120 .0160 .0199 .0239 .0279 .0319 .0359 
0.1 | .0398 .0438 .0478 .0517 .0557 .0596 .0636 .0675 .0714 .0753 
0.2 | .0793 .0832 .0871 .0910 .0948 .0987 .1026 .1064 .1103 .1141 
0.3 | .1179 .1217 .1255 .1293 .1331 .1368 .1406 .1443 .1480 .1517 
0.4 | .1554 .1591 .1628 .1664 .1700 .1736 .1772 .1808 .1844 .1879 
0.5 | .1915 .1950 .1985 .2019 .2054 .2088 .2123 .2157 .2190 .2224 
0.6 | .2257 .2291 .2324 .2357 .2389 .2422 .2454 .2486 .2517 .2549 
0.7 | .2580 .2611 .2642 .2673 .2704 .2734 .2764 .2794 .2823 .2852 
0.8 | .2881 .2910 .2939 .2967 .2995 .3023 .3051 .3078 .3106 .3133 
0.9 | .3159 .3186 .3212 .3238 .3264 .3289 .3315 .3340 .3365 .3389 
1.0 | .3413 .3438 .3461 .3485 .3508 .3531 .3554 .3577 .3599 .3621 
1.1 | .3643 .3665 .3686 .3708 .3729 .3749 .3770 .3790 .3810 .3830 
1.2 | .3849 .3869 .3888 .3907 .3925 .3944 .3962 .3980 .3997 .4015 
1.3 | .4032 .4049 .4066 .4082 .4099 .4115 .4131 .4147 .4162 .4177 
1.4 | .4192 .4207 .4222 .4236 .4251 .4265 .4279 .4292 .4306 .4319 
1.5 | .4332 .4345 .4357 .4370 .4382 .4394 .4406 .4418 .4429 .4441 
1.6 | .4452 .4463 .4474 .4484 .4495 .4505 .4515 .4525 .4535 .4545 
1.7 | .4554 .4564 .4573 .4582 .4591 .4599 .4608 .4616 .4625 .4633 
1.8 | .4641 .4649 .4656 .4664 .4671 .4678 .4686 .4693 .4699 .4706 
1.9 | .4713 .4719 .4726 .4732 .4738 .4744 .4750 .4756 .4761 .4767 
2.0 | .4772 .4778 .4783 .4788 .4793 .4798 .4803 .4808 .4812 .4817 
2.1 | .4821 .4826 .4830 .4834 .4838 .4842 .4846 .4850 .4854 .4857 
2.2 | .4861 .4864 .4868 .4871 .4875 .4878 .4881 .4884 .4887 .4890 
2.3 | .4893 .4896 .4898 .4901 .4904 .4906 .4909 .4911 .4913 .4916 
2.4 | .4918 .4920 .4922 .4925 .4927 .4929 .4931 .4932 .4934 .4936 
2.5 | .4938 .4940 .4941 .4943 .4945 .4946 .4948 .4949 .4951 .4952 
2.6 | .4953 .4955 .4956 .4957 .4959 .4960 .4961 .4962 .4963 .4964 
2.7 | .4965 .4966 .4967 .4968 .4969 .4970 .4971 .4972 .4973 .4974 
2.8 | .4974 .4975 .4976 .4977 .4977 .4978 .4979 .4979 .4980 .4981 
2.9 | .4981 .4982 .4982 .4983 .4984 .4984 .4985 .4985 .4986 .4986 
3.0 | .4987 .4987 .4987 .4988 .4988 .4989 .4989 .4989 .4990 .4990 
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Also, for z = 4.0, 5.0, aud 6.0, the probabilities are 0.49997, 0.4999997, 
and 0.499999999. 


TABLE IV 


Values of ta, 


» а = 10 а = .05 а = .025 а = 01 а = .005 » 


1 3.078 6.314 12.706 31.821 63.657 hd 
2 1.886 2.920 4.303 6.965 9.925 2 
3 1.638 2.353 3.182 4.541 5.841 3 
4 1.533 2.132 . 2.776 3.747 4.604 4 
5 1.476 2.015 2.571 3.365 4.032 5 
6 1.440 1.943 2.447 3.143 3.707 6 
7 1.415 1.895 2.365 2.998 3.499 7 
8 1.397 1.860 2.306 2.896 3.355 8 
9 1.383 1.833 2.262 2.821 3.250 9 
10 1.372 1.812 2:228 2.764 3.169 10 
11 1.363 1.796 2.201 2.718 3.106 1 
12 1.356 1.782 2.179 2.681 3.055 12 
13 1.350 1.771 2.160 2.650 3.012 13 
14 1.345 1.761 2.145 2.624 2.977 14 
15 1.341 1.753 2.131 2.602 2.947 15 
16 | 1.337 1.746 2.120 2.583 2.921 16 
17 1.333 1.740 2.110 2.567 2,898 7 
18 1.330 1,734 2.101 2.552 2.878 18 
19 1.328 1.729 2.093 2.539 2.861 19 
20 1.325 1.725 2.086 2.528 2.845 20 
21 1.323 1.721 2.080 2.518 2.831 21 
22 1.321 1.717 2.074 2.508 2.819 22 
23 1.319 1.714 2.009 2.500 2.807 23 
24 1.318 1.711 2.064 2.492 2.797 24 
25 1.316 1.708 2.060 2.485 2.787 25 
26 1.315 1.706 2.056 2.479 2.779 26 
27 1.314 1.703 2.052 2.473 2.771 27 
28 1.313 1.701 2.048 2.467 2.763 28 
29 1.311 1.699 2.045 2.462 2.756 29 
inf. 1.282 1.645 1.960 2.326 2.576 inf. 


' Based on Richard A. Johnson, 
Statistical Analysis, €) 1982, Table 2, p. 
Englewood Cliffs, N.J. 


Dean W. Wichern, Applied Multivariate 
582. By permission of Prentice-Hall, Inc., 
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TABLE VII 


Factorials 
n n! log n! 
0 1 0.0000 
1 1 0.0000 
2 2 0.3010 
3 6 0.7782 
4 24 1.3802 
5 120 2.0792 
6 720 2.8573 
7 5,040 3.7024 
8 40,320 4.6055 
9 362,880 5.5598 
10 3,628,800 6.5598 
11 39,916,800 7.6012 
12 479,001,600 8.6803 
13 6,227,020,800 9.7943 
14 87,178,291,200 10.9404 


15 1,307,674,368,000 12.1165 


Binomial Coefficients 


| 0 1 
1 1 1 
2 1 2 1 
3 1 3 3 1 
4 1 4 e Upg 1 
5 1 55; 010 ley 5 1 
6 1 6. 15 .20., 15 6 1 
7 1 T "St ab 0 25 2H 7 1 
8 1 $8 Ж 56 70 5 28 8 1 
9 1 9 36 s 16 12% и 36 9 1 
10 1 10 45 120 210 252 210 120 45 10 1 
| 11 ‚ЖА: 55 165 330 462 462 330 165 55 11 
| 12 1 12 66 220 495 792 924 792 405 220 66 
13 1 13 78 286 715 1287 1716 1716 1287 115 25 
14 ТЕ 4 91 364 1001 2002 3003 3432 3003 2002 100: 
1 1 455 1365 3003 5005 6435 6435 5005 3003 
16 1 8 un 560 1820 4368 8008 11440 12870 11440 8008 
17 1 17 136 680 2380 6188 12376 19448 24310 24310 19448 
18 1 18 153 816 3060 8568 18564 31824 43758 48620 43758 
19 1 19 171 969 3876 11628 27132 50388 75582 92378 92378 
20 71520 125970 167960 184756 
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TABLE VIII 


x 


Values of е“ and e^ 


2 e e z e ef 
nee 
0.0 1.000 1,000 2.5 12.18 0.082 
0.1 1.105 0.905 2.6 13.46 0.074 
0.2 1.221 0.819 2.7 14.88 0.067 
0.3 1.350 0.741 2.8 16.44 0.061 
0.4 1.492 0.670 2.9 18.17 0.055 
0.5 1.649 0.607 3.0 20.09 0.050 
0.6 1.822 0.549 3.1 22.20 0.045 
0.7 2.014 0.497 3.2 24.53 0.041 
0.8 2.226 0.449 3.3 27.11 0.037 
0.9 2.460 0.407 3.4 29.96 0.033 
1.0 2.718 0.368 3.5 33.12 0.030 
1.1 3.004 0.333 3.6 36.60 0.027 
1.2 3.320 0.301 3.7 40.45 0.025 
1.3 3.669 0.273 3.8 44.70 0.022 
1.4 4.055 0.247 3.9 49.40 0.020 
1.5 4.482 0.223 4.0 54.60 0.018 
1.6 4.953 0.202 4.1 60.34 0.017 
1.7 5.474 0.183 4.2 66.69 0.015 
1.8 6.050 0.165 4.3 73.70 0.014 
1.9 6.686 0.150 4.4 81.45 0.012 
2.0 7.389 0.135 4.5 90.02 0.011 
2.1 8.166 0.122 4.6 99.48 0.010 
2.2 9.025 0.111 4.7 109.95 0.009 
2.3 9.974 0.100 4.8 121.51 0.008 
2.4 11.023 0.091 4.9 134.29 0.007 
—————-——-—— 


TABLE VIII (continued) 


т e £t 29 ет et 
5.0 148.4 0.0067 7.5 1,808.0 0.00055 
5.1 164.0 0.0061 7.6 1,998.2 0.00050 
5.2 181.3 0.0055 7.7 2,208.3 0.00045 
5.3 200.3 0.0050 7.8 2,440.6 0.00041 
5.4 221.4 0.0045 7:9 2,697.3 0.00037 
5.5 244.7 0.0041 8.0 2,981.0 0.00034 
5.6 270.4 0.0037 8.1 3,294.5 0.00030 
5.7 298.9 0.0033 8.2 3,641.0 0.00027 
5.8 330.3 0.0030 8.3 4,023.9 0:00025 
5.9 365.0 0.0027 8.4 4,447.1 0.00022 
6.0 403.4 0.0025 8.5 4,914.8 0.00020 
6.1 445.9 0.0022 8.6 5,431.7 0.00018 
6.2 492.8 0.0020 8.7 6,002.9 0.00017 
6.3 544.6 0.0018 8.8 6,634.2 0.00015 
6.4 601.8 0.0017 8.9 7,332.0 0.00014 
6.5 665.1 0.0015 9.0 8,103.1 0.00012 
6.6 735.1 0.0014 9.1 8,955.3 0.00011 
6.7 812.4 0.0012 9.2 9,897.1 0.00010 
6.8 897.8 0.0011 9.3 10,938 0.00009 
6.9 992.3 0.0010 9.4 12,088 0.00008 
7.0 1,096.6 0.0009 9.5 13,360 0.00007 
7.1 1,212.0 0.0008 9.6. 14,765 0.00007 
1.2 1,339.4 0.0007 9.7 16,318 0.00006 
7.3 1,480.3 0.0007 9.8 18,034 0.00006 
7.4 1,636.0 0.0006 9.9 19,930 0.00005 
D LLLI HD utm 


TABLE IX 


Critical values of T* 


n Tio Tos To Ta 
4 

5 H 

6 2 1 

7 4 2 0 

8 6 4 2 0 
9 8 6 3 2 
10 11 8 5 B 
11 14 11 7 5 
12 17 14 10 7 
13 21 17 13 10 
14 26 21 16 13 
15 30 25 20 16 


' From F. Wilcoxon and R. A. Wilcox, 
Some Rapid Approximate Statistical Procedures, 
American Cyanamid Company, Pearl River, 
N.Y., 1964. Reproduced with permission of 
American Cyanamid Company. 


TABLE X 


Critical values of U* 
Values of Uo 


n; 
n 213 43753 6* 13 8&4 9 71059111121 13139424 15 


2 qo w-9 

3 ho DIM 2 ux 03 ae OS x 
4 @ ао ИЙ 1355.1 8 § 10 
5 0 P $5.11 $196 wQ 8 94 11 121% 13 +14 
6 1 ot ah $56 «8 30 И, 23 14 1617 19 
“д 1 P! s^ 6/28 110, 12 14 16 18 20 22 24 
8 0,12 41 &5 $110.43, 005. 15, 19, 22 24 26 29 
9 07:2. 4- Wis 10012999) 0] 20; 73 26 28 31 M 
10 ба 55. d^ 11,014.17, 20, 23. 26 29 30 36 3 
11 0.1.3 6 9 13, 16, 19. 23, 26 30 33, 31 40 4 
12 ге \ 1 14. M. 18. 22, 26, 29, 33 37: 4 45 49 
13 17764 8 12 16 20 24 28 30 37 41 45 50 54 
14 1-5. 9: AZ AF —22 2631 36-940 45 —50--55- 59 
15 1.5. 10..14:.19..24. 29 234 39 44 49 54 59 64 


t This table is based on Table 11.4 of D. B. Owen, Handbook of Statistical Tables, 
© 1962, U.S. Department of Energy. Published by Addison-Wesley Publishing Company, 
Inc., Reading, Mass. Reprinted with permission of the publisher. 


583 


TABLE X (continued) 
Values of Uy) 


n 

n : Dus 3 | ОТОН IS IIO A e E A 1013, 714 715 | 
2 us. 0 
3 0-4 d 1 1 aie, с2 ..3 
4 0 4 D 123575 6 7 
5 0 l2 do rub 5: 2-*8-1.9 до 1 
6 1 REIS Boy 479 14 112 143 15 
7 [Pe | 3 | 4 6 79 P9 M 12 14 16 17 19 
8 0i 2 or SA (Pe 38-45 134 120 $22 24 
9 Iw a p vw 9» i 14 16.18 21 123 26 28 
10 Ка К Ш 18] 16 19 22 24 |27 30 зз 
il 1 4 ee р Т 2925" 28) 131 34 37 
12 221729 B СА 17 21 24 28 31 |35 38 42 
13 Mo AFA Br 12.16. 20, 22, 27 Зр 35 |39 43 47 
14 з ДО Gee 10 1A N17. 20. 26 30. 34 38. |43 47 51 
15 Фо ЭШ ШЕ Ips NI. 24. 28 33 32 42 М7 51 56 

Values of Uo, 
n 

m 3; 4.57958 OMe esse 9.10, 13.12 13° |l . 15 
3 UL 0 a0] 1 1 2 
4 П тар 1 239012 с СО Р . 
5 0 1 dua за 4-5. 6 7-12 з 
6 IL КОДА 3-78 1.79. 10. 12 
7, gu dr PS) 510 32 13. 3s - 16 
8 1 2 Seen ee (ets 17 Ag 20 
9 (UIN &) ЧТ 13.714018. 720 557. 24 
10 [nd arida a 929.91. 13 (16 18-21. 24, 96. 29 
11 (Ug Pos a TU 13. 15.18 2L. 242—727. 30. 33 
12 мек. 220112. 7155 1187-21/-24 277-31. 54 37 
13 еды toe AGT IGT 202 О7 31 34 38 42 
14 ATE Тае 075 Об 3@ 34 38 42 46 
15 2. One ТОА О 24. og" 33 37 42. 46 51 


TABLE XI 


Critical values of и! 
Values of и'о25 


ny 
n 2 £3,54- 89 TG Т US A E TDEETI S12: 13 416 
2 2-2. 2922 
3 7 SOR 282 x» 32] 29 3 
4 бз Ж COE YES muU. 31-39. 
5 2. 9 (SBR 3$ 3-5 3 24 ML. €|-43 4 
6 2 5$ Bj 2.36€ se a 04 84 M 5] 5* 5 
7 oo al £3 ab 4t 40 5 08 55. 5$] 576 
8 27 13-2 Be oh ак 4B 565 *$.36 6].67 6 
9 2,080 8 £& 4k 505 6 506 3 017 7 
10 2 [КУТ КҮӨН аА 50 6-06 CET 7 7 
1 2M (азіі At ha hast 5s tol Mies FINU UT 8 8 
12 2.12534 5 4958769 608 Te E TI В 8 8 
13 2——2-—3—74—757—5—— 878. —1-—1 8 MERITO 
14 2. Di Bip Oe ec TE Дуи е. ТО y 
15 DIE C UE CENTS Rt 90:9: UM 10 
A 


Values of u o25 


4 
5 gogio Yo 3i M 

6 9 eho ЖЯ 12 12 13 134 13 13 

7 hi 9-4 49 M dT 14.147 ТЯ 19^ 15 
8 TEL ЯК ИЙ 140512 15 16 16 16 16 


9 [gw 155 16 15^ 16- 13) AT. 8 
10 (5. 1& 328 316! 16 17 17 18 148 18 
11 i 44 19€ 16 1 Mac Ug "t 9 
12 17 4-197 16 I7 1$ 19 19 20 20 
13 154167 14 18 0) 19 20 20 2 
14 15: 6 nitas a1 20 20 21. 22 
15 15 16 18 18 19 $50 — 22 22 


* This table is adapted, by permission, from F. S. Swed and C. Eisenhart, "Tables 
for testing randomness of grouping in a sequence of alternatives,” Annals of Mathematical 


Statistics, Vol. 14. 
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TABLE XI (continued) 


Values of u's 


n; 

n 3 4 5 6 7 8 9 10 п 12 13 14 15 
3 2 2 2 2 
4 2 2 2 Р) 2 2 2 B 
5 2 2 2 2 3 3 3 3 3 3 
6 2 2 2 3 3 3 3 3 3 4 4 
7 2 2 3 3 3 3 4 4 4 4 4 
8 2 2 3 3 3 3 4 4 4 M 5 5 
9 2 2 3 3 3 4 4 5 5 5 5 6 

10 2 3 3 3 4 4 5 5 5 5 6 6 
11 2 3 3 4 4 5 5 5 6 6 6 7 
12 2 2 3 3 4 4 5 5 6 6 6 7 7 
13 2 2 3 3 4 5 5 5 6 6 7 7 7 
14 2 2 3 4 4 5 5 6 6 7 7 7 8 
15 2 3 3 4 4 5 6 6 7 7 7 8 8 
Values of uoo; 
n; 
п, 5 6 7 8 9 10 11 12 13 14 15 
5 1 
6 1 12 13 13 
7 13 13 14 15 15 15 
8 13 14 15 15 16 16 17 17 17 
9 15 15 16 17 17 18 18 18 19 
10 15 16 17 17 18 19 19 19 20 
11 15 16 17 18 19 19 20- 20 721 
12 17 18 19 19 7120 - 21 21 22 
13 17 18 197 120 21 21 ze 2 
14 17 18 19 20 21 2209€ 23 
15 19 20 21 22 22 23 24 
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Answers to 
Odd-Numbered Exercises 


po 


(b) 6, 20, and 70. 

(a) 0.83% and 0.69%; (b) about 635 billion. 

(b) xé + бху  15x*y? + 20x3y? + 15x^y* + бху? + y“; 
х7 + 7хбу + 21х5у? + 35х*у? + 35x y" + 21х?у* + 7хуб + у", 
560. 

(а) 5; (Ы) 4. 

(а) 20; (b) 60. 

35 = 14,348,907. 

720. 

120; 72. 

50,400; 3,360. 

(a) 77,520; (b) 184,756; (c) 1,351. 

70. 

8,211,173,256. 

280. 


(a) {6,8,9}; (b) (8; (c) {1,2,3,4, 5,8}; (d) (1, S5 (е) (2,4,8); (0@. 

(a) (Car 5, Car 6, Car 7, Саг 8}; (b) {Car 2, Car 4, Car5,Car7); (с) (Car 1, 
Car8); (d) {Car 3, Car 4, Car 7, Car 8); (e) he chooses a car with air- 
conditioning; (f) he chooses a car which has either no power steering or it does 
have bucket seats; (g) he chooses a 2- or 3-year-old car with bucket seats; 


(h) same as part (g). 


588 Answers to Odd-Numbered Exercises 
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11. 
13. 


3 
y qe 


(a) (0,0), (1, 0), (2, 0), (3, 0), (4, 0), (5, 0), (0, 1), (0,2), (0,3), (0,4); (b) (0, 2), 
(1,1), (2,0), (0, 4), (1,3), (2, 2), (3, 1), (4, 0), (2, 4), (3,3), (4, 2), (5,1); 

(c) (0,0), (1, 1), (2, 2), (3,3), (4,4); (d) (0,0), (1,0), (2, 0), (3, 0), (4, 0), (5,0), 
(0, 1), (0, 2), (0, 3), (0, 4), (1, 1), (1,3), (2, 2), (3, 1), (2,4), (3,3), (4, 2), 

(5,1); (е) (0,4), (0,2), (2,0), (4,0); (f) (0, 2), (1, 1), (2, 0), (0, 4), (1,3), (2, 2), 
(3, 1), (4, 0), (2, 4), (3, 3), (4, 2), (5, 1), (0,0), (4,4); (в) (0, 4), (0, 3), (0, 2), 

(0, 1), (0, 0), (1,0), (2, 0), (3,0), (4,0), (5,0), (1,4), (1, 2), (2,3), (2,1), (3, 4), 

(3, 2), (4, 4), (4, 3), (4,1), (5,4), (5,3), (5,2); (h) (0, 4), (0,3), (0, 2), (0, 1), 
(1,0), (2,0), (3,0), (4,0), (5,0); (i) (1, 4), (1, 2), (2, 3), (2,1), (3,4), (3, 2), (4, 4), 
(4,3), (4, 1), (5, 4), (5, 3), (5, 2). 

(a) S = ((0,0, 0), (1, 0, 0), (0, 1, 0), (0, 0, 1), (1, 1,0), (1,0, 1), (0, 1, 1), (1, 1, D}; 
A = {(1,0, 1), (0, 1,1), (1,1,1)}; В = {(0,1,1)}; C = {(1,0,1)}; (b) B and C. 
(a) МОМ = {х|3 < x < 10}; (Мо N = {x|5< x <8}; (c) Мом = 
(x3 < x = 5}; (d) M’ o N= {x| <x <3 or 5 <x < 10}. 

38. 

(a) 12; (b)6; (c) 20. 


(a) Permissible assignment; (b) not a permissible assignment because the sum of 
the probabilities exceeds 1; (c) permissible assignment; (d) not a permissible 
assignment because probabilities cannot be negative; (e) not a permissible 
assignment because the sum of the probabilities is less than 1. 

(a) the second probability cannot be negative; (b) the third probability does not 
equal the sum of the first two probabilities; (c) the sum of the probabilities 
exceeds one; (d) the sum of the probabilities is less than one. 

(a) Yes; (b)9 to 11. 

(a) 0.46; (b) 0.40; (с) 0.11; (d) 0.68. 


а) б (Dk (o (di (ei. 


$. 
- (a) ie; (b 15; (с)ё&; (d) 1858. 
- (a) P(A U B) cannot be less than P(A); (b) P(A г B) cannot exceed P(A); 


(c) P(A у B) = 0.72 + 0.84 — 0.52 = 1.04 > 1. 


. 034. 


1 
0.94. 


. H: 
„В 


(а) 49; (b) 3% (с) 2; (à) 
0.7685. 


(а) з; (bs; (с) BE. 
(а) s; (b) ds 
0.8784. 

0.735. 

0.032. 

0.8. 


Page 85 


Page 97 


13. 


15. 


17. 


19. 


. (а) (bm 
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0.3818. 
(c) $- 


„б< ЕК << Ь 


0 forx « 0 
fotüsx«1 


1 
5 
F()-15. ордуң <2 
1 Ғогх 2 2 
()5 (b) & 
(c) x f(x) 


1 i 

4 E 

6 i 

10 i 

yy | S Q0 E 
fo) WEN Mond 
(a). 3x d б LE 
fols &^"h 


0 forx <0 
34 ford<x<1 
Ѓог1 =х < 2 
1 forx >2 
0 forx <0 
d (ог0=х<1 
F(x) =4% forl<x<2 
У for2<x <3 
1 forx 2 3 


(а) 9; (b) & 

(a) 0.33; (b) 0.13; (е) 0.04; (d) 0.78. 

(b) i. 
0 forx = 0 
Ух н 

ai o) F(a) =] feocx« (el 
1 forx 2 4 


forx «0 


0 
(a); (b) GG = | хз xy оох <1; (Ob 


forx = 1 
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0 forx <0 
2 for0< x <1 
3 
7. F(x) = i forl<x<2 
xu 
3 for2<x<4 
1 forx > 4 
9. (а) с= 2; 
0 forx <0 
x? 
г Ѓог0 < х <1 
(b) F(x) = x 
vl forl<x<2 
1 forx 22 
(c) 0.36. 


ч. 9b (o ӨЛЕ ека 


13. (a)1—3e7; (Ы) 2е7! —4e7*; (c) 5e-*; 
ec forx > 0 
(4) f(x) = n elsewhere 
15. (a)i; (Ы) 1; (ob (di 
17. (а) д; (Ы) 8, (cS; (9) o. 
19. (а) 0.4692; (b) 0.0986; (с) 0.2019. 
21. (а) 4; (3; (с). 


Раде 113 1. (а) 5; (b)i; (0h (9) 5. 
3. Ifk» 0, P(x = 3, Y = 1) <0, and if k <0, 


all the other probabilities are 
negative. 


7. 
9. (a) 5; (Ы) $; (е) 4. 
1. (a) Р(х,х,) = 0; (b) F(x, x2) = 


Lh. (c) Р(х, x2) = 2хүх›; three other 
regions. 


ee |? 
ee 

p——— M. -— nes 

-————————————— ыз 


Page 128 


13. 


15. 
17. 
21. 
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Axye 97? forx > 0,y > 0 
(a) f(x y) = (4 д elsewhere ý 

(6) (e! — ety. 

F(b, d) – F(a, d) — F(b, с) + F(a, с). 

(а); (b); (OX (9) 8; (90 (0i 


(a) Green die 

(1,1) (0,1) (0,1) (0,2) (0,2) (0,2) 
(1,1) (0,1) (0,1) (0,2) (0,2) (0,2) 
(1,1) (0,1) (0,1) (0,2) (0,2) (0,2) 
(1,0) (0,0 (0,0 (0,1) (0,1) (0,1) 
(1,0) (0,0) (0,0) (0,1) (0,1) (0,1) 
(1,0) (1,07 (5,1 (1,1) (1,1) 


Red die 


(b) x 
901 772 
j j * 
0001124 i 
H 


x [1-11 yi tags! 
(b) 
nj gem Ж, (|i 4 4 
x r1 1 
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xz 
3. (a) mx, y) =~ for x = 1, 2,3 and y = 1, 2, 3; (po 2 = for x = 1, 


36 
2 
2,3andz = 1, 2; (c) g(x) =< for x = 1,2,3; (d) e(z|1,2) = 2 for z = 
yz 
1,2; (е) Wy, 2/3) = T fory 21,2,3andz = 1,2. 
0 forx <0 
G^ for0<x<1 
240 
оар u forl<x<2 
1 forx = 2 
0 forx <0 
(b) Е(х|1) = 45 for0<x<1 
1 forx > 1 
4(1 = х)? for0 < x <1 
Ж = 
сы ^ elsewhere 
12у(1 – y? for0<y<1 
h = 
ee t elsewhere 
Not independent. 
2(1 + 3x) 
9. (а) р(х = х= 2) = 5 SEDER 1 
0 elsewhere 


1 -=x 
(b) TEE = {E496 for0 < x, < 1andx > 0 


0 elsewhere 
0 for x, = Oor x, = 0 
11. М(х, хз) = 4 3x,(x, + 1)(1- e) ‘for0 < x, < 1,x,>0 
dle forx, 2 1,x,» 0 
0 forx, <0 
G(x) = 4 x(x +1) ого < Кы! 
1 for x, > 1 
13. 3, 
2 
0 1 
15. (а) 


2 w | 1 2 
(c) (d) 
hw) | BEA ш mur 
2, 
2(x--:2). | forü € x-« 1 
17. ={? Б 
wee) i elsewhere (b) 0.742; 
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IEW foro <y <1 
or 
(с) w(ylx) = 4 x +2 : (d) 0.273. 
0- elsewhere 
207 x 
forlü < x < 
19. (a) g(x) = 4 50 iata a 
0 elsewhere 


x(nni-»* D for5 < y « 10 


25 
hy) = Xon -20+ ») for10 < y < 20 
25 y 
0 elsewhere 
1 
Lala for6 < у < 12 
(b) w(yl12) = E elsewhere 
(c) $. 
(20,000)* 
zonis D E a cance Oy ха 004240 
21. (a) Лоо, хь) = $ Са + 100) Ga + 100) Gs, + 100)? i $ 
0 elsewhere 
(b) 1. 1 
Page 143 1. (8702871874879; (b) P[gG) = 0] = /(0), PLa(x) = П = 


f(-1) + f(), Pleo) = 4] = f(72) + 70), P[g(x) = 9) = f(3). 
5. (a) |А | xf(x,y)dxdy; (b) | xg(x) dx. À 


or -w J -0 
9. (а) 2.4 and 6.24; (b) 88.96. 


17. $750. 

19. (а) $1.60; (b) $1.67; (c) $1.50. 
21. 30,000 kilometers. 

23. $10,000. 


Page 156 3. u =$, ш = 2, and o° = & 
11. (a) as = 3.2; (b) a, = 26. 
15. (a) k = V20; (b) К = 10. 
К 


2е 
T. M(t) = wh Thu 
Sie 
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29. At least f]. 
31. By Chebyshev's theorem, Р(х < 10) > j. The exact probability is 0.9179. 


Page 171 Link 

3. 0. 

5. cov(x,y) = 0. 
7. 

9. 

1 


. (a) u, = -7, and o; = 155; (b) и, = 19, and o; = 36. 
18. 


, var{x + y) = var(x) + var(y) + 2 cov(x, y), var(x — y) = 
var(x) + var(y) — 2 cov(x, y), cov(x + y,x — y) = var(x) ~ var(y). 
13. -56. 
15. 3. 
7. d 
19. 0.24, 
21. и = 424.5 inches and о = 0.74 inch. 
23. џи = 475 and g = 1.58. 
25. $9. 


x 


& =. 
n 
11, (с) Falt) = [1 + (г ~ 0)". 
13. 0.1707, 
18. (а) 0.2066; (b) 0.2066. 
17. (a) 0.1669; (b) 0.4073; (c) 0.4073. 
19. 0.9222 and 0.0778. 


Page 198 1. fone (151 eam for y = 0,1,2,...; 


1 кү 
En ( i} lida iG ) 
de. Н. И, й, ds. 
из uy t ua А + ЗА, 
(а) 0.1298; (b) 0.1101. 
0.4491. 
(а) 3; OE (OR; (4% 
(a) 0.1388; (b) 0.1354. 
0.2700. 
(а) 0.2584; (b) 0.6866. 
(а) 0.0232; (b) 0.8097; (c) 0.1284. 


BEBSBNRNS o 


Page 206 1. 0.0840, 
3. 00291 


5. (а)01798; (b)0.1798. 


Page 216 


Page 235 


= г л 


Page 244 


Page 261 
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7. (а) 0.6151; (b) 0.9197, 
15. (a) k = af. 
|. 


0.2639. 


2. 

25. 100. 

27. (а) 0.6065; (b) 0.5276. 
29. 

31, (a) 3,200 hours; (b) 0.2057, 


11. (a) py = 72, из = 1, оу 10,0, 5, and p = OT; (b) м, ™ 100 + 0350 and 
e$, = 1275. 


о ~ 
nio. E * P, = 4plvlv] 


15. (а) 0.1271; (Ь) 0.3594; (с) 0.1413; (d) 0.5876, 
17. (а) 1.645; (Б) 1,96; (е) 2.33; (9) 2.575. 

19. 6.094 ounces. 

21. 0.1446. 

23. (a) 0.2990; (b) 0.1774, 

25. (а) 14,5 pounds, (b) 23.625 inches. 


= е — foy» 
ЖЛ aw- fi ‹ y 
* — forty» 
(b) so [; у 


о <! 
CHR ide foro < y 
nearer fory > 9 
6. (а) ) = 
a) JO 
C" fory 0 
(b) fiy) = we 
L] chewhere 
are" fors o 
7. -f| мен 
10 ~ 20102) м0 < 145 
s. f) = -4( 2010 ~ 2+ m9) fors << 0 


[ЖОЕ үү" for y = =1, 4 Sl ТЕ ОЕ 
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3. 
1920 Жил ANI. 
ly! for0< y <8 
TEM | elsewhere 
9. a = l, and B = 2. 
) for0<y<1 
11. (a) у)={4 folsy«c3 
0 elsewhere 


125/4.  fo0«z«1 
(b) h(z2 = 4dz?* = forl < z < 81 
0 elsewhere 


DT 
13. (a) f(y, у) = 2 for у, = 2, 3, 4, 5, у; = —2,—1,0,1,апд2 = у + y < 4; 


72 
(b) а1(2) = vs, &3) = т, 81(4) = ть, and g,(5) = $. 
Uh А РАИ Pal RO DNE 9.73 2 
15. (a) (b) (c) Е 
gwlt è à "EF fle #4 4 
6+6z-12¥z  for0<z<1 
ы REIS F elsewhere 
_ [3 forthe region bounded by y = 0, u = y,and 2y — u = 2 
21. (8) g(wy) = B elsewhere 
+ 
2 ^ = for—2<u<0 
b) h(u) =42 – 
M9 Pal ut for0<u <2 
0 elsewhere 


23. f(w,z) = 24w(z – w) for w > 0,2 < 1, and > w. 


Page 268 3. My(t) = (1 — Bt) 7"; a gamma distribution with the parameters an and £. 
7. (a)0.1021; (b) 0.0259. 
9. (а) 0.475; (b) 0.570. 


Page 280 15. 0.000000015. 
17. (а) It is divided by 2; (Ы) it is divided by L5; (c)itis multiplied by 3; (d) it 
is multiplied by 2.5. 
19. At least 2$. 
21. (a) At mosti; (b) 0.0456. 
23. (а) 0.0388; (b)0.7100; (с) 0.1736. 


Page 295 


Page 302 


Page 316 


Page 328 


13. 


mo 


13. 


15. 
17. 
19. 


3. 
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The value will fall between —18.025 and 18.025. 
0.05. 
t — —1.35; the data support the claim. 


0.0763 
14 
27- 
0.055 
t = 3.57; the data do not support the conjecture. 
0.05. 
1 n 

= „апа о? = 3 

Bee T ^ (n- (n2) 


FoU TR LU ү SEDI X Maas anys 
(a) (b) 
(Vi) E NEUE so) E A P B od 
эВ n-2 
Һу, R) = n(n – DIYS + | f(x) а} 


» 


g(R) = n(n - 1)(1 - R)R'^? for 0 < R < 1 and g(R) = 0 elsewhere; 


xy gis PANNE pil 2(n - 1) 
nere (n + (n + 2) 


0.081. 


(a) Decision reversed; (b) decision is the same. 
(a) Go to the site 33 miles away; (b) go to the site 27 miles away; (c) it does 


not matter. 1 

(a) Expand now; (b) hotel Y; (c) go to the site 27 miles away. 

(a) Hotel Y; (b) go to the site 27 miles away. 

(a) Strategies I and 2, the value is 5; (b) Strategies II and 1, the value is 11; 

(c) Strategies I and 1, the value is —5; (d) Strategies I and 2, the value is 8. 
A 


NG G 


(a) NSK о [| (b) Give away glasses and steak knives. 
B 
sjefs] 


(а) & and $; (b) апа т; (o -f 

4 and 3 for the defender; $ and 1 for the attacker; the value is $10,333,333. 

(a) First station owner should lower his price; (b) the owners should take turns 
lowering their prices on alternate days. 
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Page 345 


Page 354 


Page 361 


Page 373 


Page 382 


15. 
21. 


T. 
13. 


ow 


os 


a(0 = 4) aX0 = 3) 


(а) 


(6) on = 4, 4(1) = 4, and Ko 4; d,(0) = 4, d;(1) = 4, and d,(2) =}; 
шш 4, d) = }, and d,(2) = 540) = 4, d,(1) = }, and d,(2) = 3; d;(0) = 
3, 4(1) = 4, and 4(2) = {; 4(0) = 4, de(1) = 4, and 4,(2) = 4; 4,00) = 3, 

d,(1) = 3, and d,(2) = 4; d(0) = 3, d(1) =4, and d(2) = 4; (c) ds, dg, and d; 
аге not admissible; (d) dj; (e) d}. 

ЕЯ 
(a) Biased; (b) consistent. 


2m}. 
(a) X; (b) X 


Ў (x; — ш)? 
Jim 
MAIER) cae 


The smallest sample value. 

m + 2п; п + nnn; 
NE Birra E 2N 
The smallest and largest sample values. 
XX + y) and I(x - y). 


0.29, 
0.4786. 
(a) 100; (b) 112; (с) 108. 


EC e 
~ In(1 = a) 
1+Vi-a@ 
z : 
27.52 < u < 29.48. 
547 € y < 5.89. 
-7.285 < ш — Ba € -2/715. 
—-0.198 < ш — Ш < 1.998. 


c= 


Aas 8) zl 6(1 - ô) 
Fert 
2n Za/2 m В Tf = 


0.611 < 8 < 0.749, 
п = 1,068. 


Page 395 


Page 408 


Page 421 


Page 427 


Page 434 


11. 


13. 
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—0.372 < 0, — 6 < —0.204. 
0.040 < о? < 0.280. 


2 
0165 < 21 < 2752. 
932 


(а) Simple; (b) composite; (c) composite; (d) composite; (e) simple; 
(f) composite; (g) composite; (h) composite. 
o = 0.1331, and В = 0.0159. 
а = 0.223, and В = 0.451. 
0.1139. 
K 

(1 = 6.) 
n 

&,( T bo) 

K 

1 - 6 
n 0 

1-6 ` 
They would be committing a type I error if they erroneously rejected the 
hypothesis 0 = 0.60; they would be committing a type II error if they erroneously 


accepted the hypothesis 0 — 0.60. 
(a) 0.034; (Ы) 0.045; (c) 0.052. 


x< 


(a) 0,0, and 2; (b) $ $, H, 5, and 0. 
(a) 0.852; (b) 0.016, 0.086, 0.145, 0.134, 0.122. 
(a) 0.0375, 0.0203, 0.0107, 0.0055, 0.0027; (b) 0.9329, 0.7585, 0.3840, 0.0419. 


(a) A = (2) gn - V. 
—21п А = 7.845; reject the null hypothesis. 


n = 151. 
z = 3.02; reject the null hypothesis. 
t = 5.66; reject the null hypothesis. 


Less than 0.145 or greater than 0.255; (a) 018; (b) 0.71; (с) 0.71; (d) 0.18. 


z = —4.88; reject the null hypothesis. 
0.90; the null hypothesis cannot be rejected. 
2.33; null hypothesis cannot be rejected. 


t= 
t= 
x? = 5.92; null hypothesis cannot be rejected. 


(a) x? = 23; the null hypothesis cannot be rejected; 


hypothesis cannot be rejected. 
Е = 2.47; assumption was reasonable. 


(b) z = -0.8; the null 


ko, = 11; 0.95, 0.87, and 0.75. 
koos = 12 and Кооз = 1; 0.97, 0.99, 0.98, and 0.96. 
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Page 442 


Page 457 


Page 469 


Page 478 


Page 490 


11. 
13. 
15. 


z = ~1,30; null hypothesis cannot be rejected. 
x? = 0.7; difference is not significant. 
x^ = 7.1; null hypothesis cannot be rejected. 


X? = 52,8; reject the null hypothesis. 

X? = 3.71; null hypothesis cannot be rejected. 

y! = 29.2; reject the null hypothesis, 

(b) 0.0179, 0.1178, 0.3245, 0.3557, 0.1554, 0.0268, and 0.0019; (c) y^ = 1.44; an 
excellent fit. 


tex 
Hya = —,— and рыу = бу. 


Hai = Sand pyo =}. 
4-28 


(а) ў = ~5,964 + 1,554х; (Ы) 4.914. 

(а) ў = 0490 + 0,272x, (b) ў = 10.826. 

(b) $ = 13 = фи (where u = x - 7) or f= 19 – x; ў = 147. 
Ў = 1370.38)". 


(a) = ix. (5) Å £ Luar Am 

t = 3,58; reject the null hypothesis. 

J = 6.026  1,493x; since 1 = 3.85, the null hypothesis must be rejected. 
70.21 < B < -0.051, 

! = 3.08; the null hypothesis cannot be rejected. 

=237 < а < 65.57. 

(а) 6.31 < цуз < 9.67; (b) 331 < y, < 12.67. 


r = 0,553; it is significant. 
r = 0.727; it is significant. 
284 < B < 4.10. 


(а) Й, = -0.627, B, = 0.0972, and f; = 0.662; (b) ў = 

ўт 19768 37191, = 0.120%: ўе 7205 
ў = -2.33 + 090x, + 1.27х, + 0.90x,. 

Ў = 38439 — 36.00x, + 0.896x;. 

72.2 < By < 1,4444. 

13.7 < B, < 46.5, 

0.244 < В, < 1.08. 

79,108 + 1,588. 

101.4 + 574, 
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Page 505 7. F = 17.0; the differences are significant. 
9. Р = 39.3; reject the null hypothesis; Д = 18.5, d, = 4, dy = 1.5, and й, = $5. 
11. F = 1.05; the differences can be attributed to chance. 


ГА 
[. Page 515 5. (b) 


7. For the diet foods F = 6.64, which is significant; for the laboratories Р = 491, 
which is significant. 

9. (a) Е = 94.24, which is significant; (b) F = 2.56, which Is not significant, (c) 
F = 27.04, which is significant. 


Page 536 9. x = 4; reject the null hypothesis, 
11. Т = 32.5; reject the null hypothesis. 
13. z = =2.07; reject the null hypothesis. 
15. x = 11; cannot reject the null hypothesis, 
17. z = 1,85; reject the null hypothesis. 


agens „+! 
\ 5 The minimum value of W 0 corresponding to R, = PE! for өй ( ond the 
maximum value of W is 1; W = 0 reflects a complete lack of association and 


agreement. 
7. u = 17; reject the null hypothesis of randomness, 

- 9. : = -0.10; cannot reject the null hypothesis of randomness. 
13. z= 0-4; there is a trend. 
15. z = 0.01; cannot reject the null hypothesis. 
17. z = 3.55; the value of rs is significant. 
19. z = 1.78; cannot reject the null hypothesis of no correlation. 


Index 


Acceptance region, 387 
Addition rules of probability, 41, 
45 
Alternative hypothesis, 386 
composite, 386 
one-sided and two-sided, 412 
simple, 386 
Analysis of covariance, 519 
Analysis of variance (see One- 
way analysis of variance 
and Two-way analysis of 
variance) 
Asymptotic efficiency, 338 
Asymptotic property, 339 
Asymptotically unbiased 
estimator, 340 


Bar chart, 82 

Bayes criterion, 324 

Bayes risk, 324 

Bayes' theorem, 64 

Bayesian estimation, 349 

Bayesian inference, 215 

Bernoulli distribution, 176 
moments, 183 


Bernoulli trial, 177 
Beta distribution, 215 
mean, 216 
variance, 216 
Beta function, 216 
Biased estimator, 333 
Binomial coefficient, 13 
generalized, 21 
table, 579 
Binomia] distribution, 178 
mean, 179 


normal approximation, 227 
Poisson approximation, 193, 
194, 195 
table, 179, 207, 563 
variance, 179 
Binomial waiting-time 
di , 187 


function, 
Bivariate normal surface, 235 


Bivariate regression, 446 
Block effects, 510 

Block sum of squares, 511 
Blocks, $09 

Boolean algebra, 555 


Cauchy distribution, 217 
Central limit theorem, 274, 275, 


table, 288, 576 

variance, 285 
Choices, multiplication rule, 2 
Circular normal distribution, 235 
Circular permutation, 7 
Classical probability concept, 25 
Coefficient of concordance, 549 


604 — Index 


Column effect, 516 
Column sum of squares, 516 
Combinations, 9 
Complement, 31, 554 
Complete block design, 509 
Completely randomized design, 
508 
Composite hypothesis, 386 
Concordance, coefficient of, 549 
Conditional density, 123 
Conditional distribution, 122 
joint, 126 
Conditional distribution function, 
129 
Conditional expectation, 169, 
170 
Conditional mean, 170 
Conditional probability, 52, 54 
Confidence, degree of, 365 
Confidence coefficient, 365 
Confidence interval, 365, 368 
difference between means, 
370, 372 
difference between 
proportions, 378 
mean, 366 
mean of y at x = xy, 470 
proportion, 376 
fatio of two variances, 381 
regression coefficients, 467, 
491 
variance, 380 
Confidence limits, 365 
Consistency criterion, 49 
Consistent estimator, 339 
Contingency coefficient, 442 
Contingency table, 438 
Continuity correction, 229 
Continuous random variable, 88 
Continuous sample space, 29 
Controlled experiment, 508 
Correlation analysis, 463 
normal, 463 
Correlation coefficient: 
population, 232 
sample, 474 
Count data, 429 
Countable sample space, 28 


Covariance, 161 
Cramér-Rao inequality, 335 
Critical region, 387 
indamissible, 400 
most powerful, 392 
size of, 387 
unbiased, 411 
uniformly more powerful, 400 
uniformly most powerful, 400 
Cumulant, 236 
Cumulative distribution (see 
Distribution function) 


de Morgan law, 557 
Decision function, 321 
inadmissible, 322 
Decision theory, 306 
Degrees of freedom: 
chi-square distribution, 213, 
285 
contingency table, 438 
F distribution, 292 
goodness of fit, 441 
one-way analysis of variance, 
503 
t distribution, 289 
two-way analysis of variance, 
512 
Density (see Probability density 
function) 
Dependent events, 59 
Difference between means: 
confidence interval for, 370, 
372 
test of hypothesis, 418, 420 
Difference between proportions: 
confidence of interval for, 378 
test of hypothesis, 435 
Differences among k proportions, 
432 
Discrete random variable, 77 
Discrete uniform distribution, 
175, 176 
Distribution (see Probability 
density function, 
Probability distribution, 
and individually listed 


probability density 
functions and 
distributions) 
Distribution-free methods, 521 
Distribution function, 82, 94 
conditional, 129 
joint, 105, 107 
marginal, 122, 130 
marginal, 119, 129, 130 
Distribution function technique, 
240 
Dominated strategy, 310 


Efficiency, 337 
asymptotic, 338 
relative, 337 

Elimination, rule of, 63 

Empty set, 32 

Equalizer principle, 325 

Equitable game, 145, 312 

Error mean square, 502 

Error sum of squares, 501 

Errors, Type 1 and Type II, 387 

Estimate: 
interval, 364 
point, 333 

Estimation, 332 
Bayesian, 349 
interval, 364 
least squares, 349, 453 
point, 333 

Estimator, 333 
biased, 333 
consistent, 339 
least squares, 453 
maximum likelihood, 350 
minimum variance, 335 
pooled, 371 
relatively more efficient, 337 
sufficient, 341 
unbiased, 333 

asymptotically, 340 

Events, 30 
dependent, 59 
independent, 58, 60 

pairwise, 60 
mutually exclusive, 32 


Expectation, 134 

conditional, 169, 170 
Expected cell frequencies, 433 
Expected value, 135 
Experiment, 27 
Experimental design, 498 
Experimental error, 501 
Exponential distribution, 211 

mean, 214 

variance, 214 

and waiting times, 212 


F distribution, 292 
degrees of freedom, 292 
mean, 297 
table, 293, 575 

Factorial moment-generating 

function, 182, 184 

Factorial moments, 184 

Factorial notation, 5 

Factorials, table, 579 

Failure rate, 190, 218 

Fair game, 145 

Finite game, 309 

Finite population, 278 
correction factor, 280 
mean, 279 
variance, 279 

Finite sample space, 28 

Frequency interpretation of 

probability, 26 

Functions of random variables, 

240 


Game: 
equitable, 312 
finite, 309 
payoff, 310 
matrix, 310 
saddle point, 313 
statistical, 320 
Strictly determined, 313 
two-person, 309 
value of, 310 
zero-sum, 309 
Gamma distribution, 210 


mean, 214 
moment-generating function, 
214 
moments, 213 
variance, 214 
Gamma function, 210 
Gaussian distribution (see 
Normal distribution) 
Geometric distribution, 189 


variance, 190 
Goodness of fit, 440 
Grand mean, 499, 510 


H test, 535 
Histogram, 80 
Hypergeometric distribution, 190 
binomial approximation, 192 
mean, 191 
multivariate, 205 
variance, 191 
Hypothesis: 
alternative, 386 
composite, 386 
null, 387 
simple, 386 
statistical, 386 
Hypothesis testing, 385 


MP ra os 
Latin square, 515 
Least squares, method of, M9, 
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Marginal distribution, 119 
joint, 120 
Marginal distribution function, 
119, 129, 130 
joint, 122 
Markov's inequality, 158 
Mathematical expectation, 134 
Maximum likelihood, method of. 
350 
Maximum likelihood estimator, 
350 
invariance property, 354 
Mean: (see also means listed for 
special probability 
distributions and 
probability densities) 
conditional, 170 
confidence interval for, 366, 
368 
population, 279 
random variable, 147 
sample, 272, 273 
standard error, 274 
standardized, 274 
test concerning, 415, 417 
Mean square, 502 
Mean square error, 338 
Median, 98 
population, 301 
sample, 300 
Method of least squares, 349, 
453 
Method of maximum likelihood, 
349, 350 
Method of moments, 349 
Mid-range, 347 
Minimax criterion, 312, 324 
Minimax strategy, 312 
Minimum variance unbiased 
estimator, 335 
Mixed random variable, 96 
Mixed strategy, 314 
Moment-generating function, 153 
(see also moment- 
generating functions of 
individual probability 
distributions and 


probability densities) 
factorial, 182, 184 
joint, 172 
Moment-generating function 
technique, 240 
Moments, 146 
about the mean, 147 
about the origin, 146 
factorial, 184 
product, 160, 161 
sample, 349 
Moments, method of, 349 
Most powerful critical region, 
392 
uniformly, 400 
Multinomial coefficients, 17 
Multinomial distribution, 203 
Multiple comparisons tests, 515 
Multiple regression, 447, 457 
linear, 480 
Multiple regression equation, 457 
Multiplication rules, probability, 
56, 57 
Multi-stage test, 401 
Multivariate hypergeometric 
distribution, 205 
Multivariate normal distribution, 
231 
Mutually exclusive events, 32 


Negative binomial distribution, 
187 
and binomial distribution, 188 
mean, 189 
moment-generating function, 
268 
variance, 189 
Neyman-Pearson lemma, 392 
Neyman-Pearson theory, 391 
Nonparametric methods, 521 (see 
also individual tests) 
Normal correlation analysis, 473 
Normal distribution, 221 
approximation of binomial 
distribution, 227 
bivariate, 231 


moment-generating function, 
237 
circular, 235 
mean, 223 
moment-generating function, 
222 
multivariate, 231 
standard, 223 
table, 223, 574 
variance, 223 
Normal equations, 454, 481 
Normal regression analysis, 463 
Null hypothesis, 387 


Observed cell frequencies, 433 
Occupancy theory, 19 
Odds, 47 
fair, 367 
One-sample sign test, 522 
One-sample t test, 417 
One-sided alternative, 412 
One-sided confidence interval, 
367 
One-tailed test, 412 
One-way analysis of variance, 
499, 503 
computing formulas, 503 
identity, 500 
mean squares, 502 
model, 499 
sums of squares, 503 
table, 503 
unequal sample sizes. 505 
Operating characteristic curve, 
400 : 
Opportunity loss, 317 
Optimum strategy, 310 
Order statistics, 299 
Outcome, 27 


Paired-sample sign test, 524 
Pairwise independent events, 60 
Parabola, 495 

Parameter, 175 

Parameter space, 402 


Pareto distribution, 218 
Partition, 11 
Pascal distribution, 187 
Pascal's triangle, 16, 19 
Payoff, 310 
matrix, 310 
Pearson curves, 219 
Permutations, 5 
circular, 7 
Personal probability, 26 
consistency criterion, 49 = 
Petersburg paradox, 145 
Pivotal method, 369 
Point estimate, 333 
Poisson distribution, 194 
approximation of binomial 
distribution, 193, 194, 
195 
mean, 195 
moment-generating function, 
196 
table 207, 568 
variance, 195 
Poisson process, 196, 212 
Pooled estimator, 371 
Pooling, 371 
Population, 271 
finite, 278 
size, 278 
infinite, 271 
median, 301 
теап,. 279 
variance, 279 
Population distribution, 271 
Posterior distribution, 356 
Postulates of probability, 36 
Power, 392 
Power function, 398 
Prior distribution, 356 
Prior probabilities, 66 
Probability: 
addition rules, 41, 45 
and odds, 47 
Bayes' theorem, 64 
classical concept, 25 
conditional, 52, 54 
consistency criterion, 49 


frequency interpretation, 26 
multiplication rules, 56, 57 
personal, 26 
postulates, 36 
prior, 66 
subjective, 26 
Probability density function, 92 
conditional, 123 
joint, 106 
marginal, 119 
Probability distribution, 79 
conditional, 122 
joint, 94 
marginal, 119 
Probability function, 79 
Probability integral 
transformation, 253 
Probability measure, 36 
Product moments, 160 
about mean, 161 
Proportion: 
confidence interval for, 376 
test concerning, 429, 430 
Pure strategy, 314 


Random sample: 

finite population, 278 

infinite population, 272 
Random variable, 74, 75 

continuous, 88 

discrete, 77 

mixed, 96 

standardized, 157 
Randomization, 508 
Randomized block design, 509 
Randomized strategy, 314 
Range, sample, 303 
Rank correlation coefficient, 546 
Ratio of two variances, 

confidence interval for, 
381 

Rayleigh distribution, 218 
Regression, 446 

bivariate, 446 

linear, 450 

multiple, 447 
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Regression analysis, 463 
normal, 463 
Regression equation, 447 
multiple, 457 
Regret, 317 
Rejection region, 387 
Relatively more efficient 
estimator, 337 
Repeated trials, 177 (see also 
Binomial distribution) 
Risk function, 322 
Row effects, 516 
Row sums of squares, 516 
Rule of elimination, 63 
Runs, 541 
above and below the median, 
544 
Runs test, table, 585 


Saddle point, 313 
Sample, 271 
random, 272, 278 
size, 274 
Sample correlation coefficient, 
474 
Sample mean, 272, 273 
Sample median, 300 
Sample moment, 349 
Sample point, 27 
Sample range, 203 
Sample space, 27 
continuous, 29 
countable, 28 
discrete, 28 
finite, 28 
Sampling distribution, 273 
Sampling without replacement, 
56 
Scattergram, 476 
Sequential test, 401 
Sign test: 
one-sample, 522 
paired sample, 524 
Signed-rank test, 525 
table, 582 
Significance test, 401 
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Simple hypothesis, 386 
Simulation, 253 
Size of critical region, 387 
Skewness, 148 
Smith-Satterthwaite test, 445 
Spearman's rank correlation 
coefficient, 546 
Standard deviation, 147 
Standard error of estimate, 469 
Standard error of the mean, 274 
Standard form of distribution, 
157 
Standard normal distribution, 223 
(see also Normal 
distribution) 
Standardized mean, 274 
Standardized random variable, 
157, 229 
Statistic, 387 
Statistical game, 320 
Statistical hypothesis, 386 
Statistical model, 385 
Stirling's formula, 18 
Strategy, 309 
dominated, 310 
minimax, 312 
mixed, 312 
optimum, 310 
pure, 314 
randomized, 314. 
Strictly determined game, 313 
Student-t distribution (see t 
distribution) 
Subjective probability, 26 
consistency criterion, 49 
Sufficient estimator, 341 
Symmetry of distribution, 148, 
157 


1 distribution, 289 


degrees of freedom, 289 
table, 575 
variance, 297 
Test statistic, 387 
Tests of hypotheses, 332, 385 
one-tailed, 412 
two-tailed, 412 
Tests of significance, 401 
Theory of games, 309 
Theory of runs, 541 
Tolerance limits, 303 
Total sums of squares, 501 
Transformation of variables 
technique, 240, 247 
Treatment, 500 
effect, 499, 510 
mean square, 502 
sum of squares, 501 
Tree diagram, 3 
Trial, Bernoulli, 177 
Triangular probability density, 
260 
Two-person game, 309 
Two-sample t test, 420 
Two-sided alternative, 412 
Two-tailed test, 412 
Two-way analysis of variance, 
509, 512 
computing formulas, 513 
identity, 511 
model, 510 
sums of squares, 511 
labie, 512 
Type 1 and Type II errors, 387 


U test, 530 
table, 583 
Unbiased critical region, 411 
Unbiased estimator, 333 
minimum variance, 335 


Uncorrelated random variables, 
234 
Uniform density, 208 
mean, 209 
variance, 209 
Uniform distribution, discrete, 
175, 176 
Uniformly more powerful critical 
region, 400 
Uniformly most powerful critical 
region, 400 
Union, 31, 554 


Value, expected, 135 
Value of game. 310 
Variance: 
confidence interval for, 380 
population, 279 
random variable, 147 
sample. 272, 273 
test concerning, 425 
Variance ratio distribution, 295 
(see also F distribution) 
Venn diagram, 32 


Waiting time distribution, 
binomial, 187 
Weibull distribution, 218 
Weighted mean, 338 
Wilcoxon signed-rank test, 525 
table, 582 
Wilcoxon test, 530 
table, 583 


Zero-sum two-person game, 309 
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